Machine Learning Engineer at Apple (02/2020-present)
Research Scientist at RIT-Boston (08/2016-01/2020)
Rakuten Institute of Technology, Boston
Research Assistant at CMU (08/2014-08/2016)
Advisor: Prof. Eduard Hovy
Jointly trained two parallel deep neural networks, with 19-layer VGG image feature and word2vec sentence embeddings as inputs, respectively. The objective is to maximize the cosine similarity of the two output representations if the image/sentence pair are relevant with Mean Square Loss. It reached state-of-the-art image to sentence retrieval results on Flickr30k dataset (R@5 = 0.64, R@10 = 0.77).
Led a team in 2105 and 2016 TAC KBP Trilingual Entity Discovery and Linking Track. Aimed to extract named entity mentions from a source collection of textual documents in multiple languages (English, Chinese and Spanish) and link them to an existing Knowledge Base (Freebase). Proposed a graph-based model that jointly tackles entity recognition and entity linking tasks together.
Constructed PropStore, a multilingual (English, Chinese and Spanish) propositional Knowledge Base of dependency relations between words from Wikipedia dump. Proposed a novel Word Sense Disambiguation algorithm that combines POS-sensitive word2vec representations and distributional information derived from PropStore and OntoNotes sense inventory.
Research Staff at JHU HLTCOE (SCALE) (06/2015-08/2015)
Advisor: Prof. James Mayfield and Prof. Mark Dredze
Focused on different models for Chinese entity linking, including training a Chinese to English machine translation language package from scratch using Joshua and improving Slinky, an entity linking tool that implements a highly parallel message passing infrastructure using Akka and adopts SVM learning to rank approach for entity disambiguation.
NTU Speech Processing Lab (09/2001-06/2013)
Advisor: Prof. Lin-Shan Lee
Developed a C++ toolkit that implements EM algorithm for Probabilistic Latent Semantic Analysis (PLSA) and Gibbs sampling for Latent Dirichlet Allocation (LDA). Applied topic models to spectrogram analysis and extracted-based text summarization task.
Performed Minimum Phone Error (MPE) training on merged acoustic units for transcribing Chinese-English code-switched lectures with highly imbalanced language distribution. Significantly improved recognition accuracy of English from 59.9% to 68.23%.