Research

Machine Learning Engineer at Apple (02/2020-present)

Siri Understanding

Research Scientist at RIT-Boston (08/2016-01/2020)

Rakuten Institute of Technology, Boston

Query Understanding (classification & segmentation)

Item Product Linking

Product Taxonomy Classification (hierarchical classification)

Product Search Re-Ranking

Rakuten Data Challenge and SIGIR 2018 eCom Workshop

Research Assistant at CMU (08/2014-08/2016)

Advisor: Prof. Eduard Hovy

Learning Image and Text Semantic Relatedness

Jointly trained two parallel deep neural networks, with 19-layer VGG image feature and word2vec sentence embeddings as inputs, respectively. The objective is to maximize the cosine similarity of the two output representations if the image/sentence pair are relevant with Mean Square Loss. It reached state-of-the-art image to sentence retrieval results on Flickr30k dataset (R@5 = 0.64, R@10 = 0.77).

Trilingual Entity Linking and Discovery

Led a team in 2105 and 2016 TAC KBP Trilingual Entity Discovery and Linking Track. Aimed to extract named entity mentions from a source collection of textual documents in multiple languages (English, Chinese and Spanish) and link them to an existing Knowledge Base (Freebase). Proposed a graph-based model that jointly tackles entity recognition and entity linking tasks together.

Word Sense Disambiguation via PropStore and OntoNotes

Constructed PropStore, a multilingual (English, Chinese and Spanish) propositional Knowledge Base of dependency relations between words from Wikipedia dump. Proposed a novel Word Sense Disambiguation algorithm that combines POS-sensitive word2vec representations and distributional information derived from PropStore and OntoNotes sense inventory.

Research Staff at JHU HLTCOE (SCALE) (06/2015-08/2015)

Advisor: Prof. James Mayfield and Prof. Mark Dredze

Chinese Entity Discovery and Linking

Focused on different models for Chinese entity linking, including training a Chinese to English machine translation language package from scratch using Joshua and improving Slinky, an entity linking tool that implements a highly parallel message passing infrastructure using Akka and adopts SVM learning to rank approach for entity disambiguation.

NTU Speech Processing Lab (09/2001-06/2013)

Advisor: Prof. Lin-Shan Lee

Latent Topic Modeling (Probabilistic Latent Semantic Analysis & Latent Dirichlet Allocation)

Developed a C++ toolkit that implements EM algorithm for Probabilistic Latent Semantic Analysis (PLSA) and Gibbs sampling for Latent Dirichlet Allocation (LDA). Applied topic models to spectrogram analysis and extracted-based text summarization task.

Minimum Phone Error Model Training on Merged Acoustic Units for Transcribing Bilingual Code-Switched Speech

Performed Minimum Phone Error (MPE) training on merged acoustic units for transcribing Chinese-English code-switched lectures with highly imbalanced language distribution. Significantly improved recognition accuracy of English from 59.9% to 68.23%.