Cheers

NLP based recommender system using Scikit-Learn, Tensorflow, and Flask in a fully deployed web app

For more technical details, see my github repo and the demo below shows the main functionalities. You can view my completed app here.


An interactive web app that allows users to type the flavors/characteristics they find appealing and shows five recommended beers and the user's taste profile.

StockApp

I scraped thousands of consumer reviews and manufacture descriptions of their beers. These reviews and descriptions are highly informative as they contain words that provide information the characteristics of that beer. Then I used Natural Language Processing (NLP) to uncover relationships between these descriptive sentences and beers. When the user inputs their own search terms, first algorithm transforms unstructured text data to high dimensional vectors. With sentence embedding I can capture the context of the whole sentence in a vector. It allows us to understand the intention of the sentence. Once the search terms vectorized, the algorithm classifies these vectors.

StockApp

I tried a variety of models for classification task (with and without tranfer learning). You can see the performance of them below. Top-5 accuracy means any of our model's top 5 highest probability answers match with the expected answer. Top-5 accuracy means any of our model's top 5 highest probability answers match with the expected answer.

StockApp

Stock Analysis App

Retrieves data from Alpha Vantage and presents the financial analysis using Pandas, Bokeh and Flask

For more details, see github rep. The demo below shows the main functionalities. You can view my completed app here.

StockApp

Price plots: Opening, Highest, Lowest, Adjusted closing

Analysis plots: Candlestick Chart, Daily Returns, Monthly Returns, Yearly Returns, Annualized Volatility, Daily 12-1 Price Momentum Signal

Senior Project

Clustering and Synchronizing of Audio Sequences

For more details on the scientific content of the project, see my github page.

For my senior project, I developed an application that synchronizes heavily corrupted multi-audio recordings within a few seconds of the core search time using a landmark-based audio fingerprinting method in MATLAB.

Biases

By using FFT over small windows of time in the audio samples, we can create a spectrogram of the audio sample. After that , we can find the peaks in amplitude. You can think it as a fingerprint for the audio sample.

Econometrics Project

Schooling Aid vs College Attendance

For more details on the scientific content of the project, see my paper.

For the final project of BC's Graduate Level Applied Econometrics course ECON 8823, I examined the effect of schooling aid on college attendance. I utilized DID estimator fo the identification. It is a useful technique in impact valuation when randomization on the individual level is not possible.

DID
  • The research question is whether schooling aid increases college attendance or it just financially supports students who would have gone to college regardless of aid.
  • Examining the effect of schooling aid on college attendance is challenging because treated and untreated students can differ in unobservable characteristics correlated with potential outcomes, even controlling for differences in observed characteristics.
  • However, we can use a difference-in-differences estimator for identification strategy. The Social Security Administration has provided benefits to the children of deceased, disabled, and retired Social Security beneficiaries until those children are 18.
  • Using difference-in-differences analysis, we find that the availability of $1000 of grant (normalized to $2856) increases college attendance by 0.167 years and the probability of attending by 3.8%.

PhD Thesis

Affirmative Action in Two Dimensions: A Multi-Period Apportionment Problem

For more details on the content of the project, see my paper. Code and data associated with this paper are available on my github page.


This project develops an applied algorithm using statistics and graph theory to improve social and economic impact. Court's objections and public protests in India inspired us to formalize an affirmation action issue. We document merit in debated proposals but also faults. We present an alternative solution and measure the impact by running simulations on 12k jobs. The result of the empirical case study is shown below.

thesis

Abstract: In many settings affirmative action policies apply at two levels simultaneously, for instance, at university as well as at its departments. We show that commonly used methods in reserving positions for beneficiaries of affirmative action are often inadequate in such settings. We present a comprehensive evaluation of existing procedures to formally document their shortcomings. We propose a new solution with appealing theoretical properties and quantify the benefits of adopting it using recruitment advertisement data from India.

The solution is built around a network flow algorithm that takes a flow network as input and randomly constructs another flow network with fewer fractional flows as its output. By iterative application of this algorithm, a flow network with integral flows is generated.

thesis2