About Me
My experience has instilled in me a strong desire to solve difficult challenges with innovative data-driven solutions. I’m enthusiastic about projects that benefit the community, and I’m eager to assist businesses in developing better products by using data to derive insights.
Stacks:
- Programming Languages: Python, Java, JavaScript, Shell Script, SQL
- Database Management Tools: PostgreSQL, MySQL, Microsoft SQL Server, MongoDB, Weaviate
- Technology: Adobe Experience Manager(AEM), MuleSoft, Docker, AWS, Jupyter Notebook, Git, Tableau
- Python Packages: Pandas, NumPy, Scikit-learn, Pytorch, Transformers, SpaCy, Gensim, NLTK, Selenium, Re
Experience
PiSrc
Full Stack Software Engineer
February 2022 - Present
Gain hands-on experience on software development and data pipeline implementation
- Integrated Weaviate Vector Database with Azure OpenAI to empower Retriever-Augmented Generation [AI chatbot] (https://www.rockwellautomation.com/en-us.html)
- Orchestrated 7 MuleSoft APIs, streamlining delta updates ingestion into Lucidworks Fusion for the Partner Locator, ensuring a seamless user experience.
- Engineered data pipelines infused with Machine Learning algorithms, driving personalized user experiences and optimizing marketing conversion rates
- Built a recommendation system delivering customized products and website experiences
- Tailored AEM components to deliver customized questionnaires, complete with category support and robust reporting functionalities
Stevens Institute of Technology
Research Assistant/Teaching Assistant, Deep Learning and Web Analytics
June 2020 - December 2021
Language features pattern detection with Deep Learning models, data parsing and statistics modeling
- Cleaned and structured 94,581 earnings call transcripts to explore 96 language factors influencing stock return
- Achieved 72% accuracy on text classification with domain-adapted BERT language model using Pytorch
- Conducted text analysis, sentiment analysis, and text mining utilizing NLP techniques with SpaCy and NLTK
- Set up and managed a remote GPU environment on Ubuntu for Machine Learning and Deep Learning tasks
- Developed tutorials on implementing deep learning models using related Python packages
1lift
Data Analyst Intern
April 2019 - July 2019
Gain hands-on experience on necessary tools and understanding of develop environment
- Built a cost analysis module for the application with Excel advanced functionalities
- Extracted structured data with MySQL, cleaned the unstructured textual data with Python
- Manipulated, visualized and presented the statistical results with Tableau Dashboard
- Explored regression models to provide a baseline of elevators’ failure rate for the Operations Department
Education
Stevens Institute of Technology
MSc in Data Science (GPA 3.8/4.0)
September 2019 - December 2021
Relevant Coursework: Statistical Methods, Statistical Inference, Advanced Optimization Methods, Advanced Data Analytics & Machine Learning, Deep Learning, Natural Language Processing, Web Analytics, Database Management Systems, Web Programming, Data Structures & Algorithms
Scholarship: Provost’s Scholarship, 2019
Guangzhou University
BSc in Mathematics and Applied Mathematics
September 2014 - May 2018
Relevant Coursework: Probability and Mathematical Statistics, Operational Research, Numerical Analysis, Advanced Algebra, Mathematical Analysis, Real Function Theory, Functional analysis, Ordinary Differential Equations, Partial Differential Equations
Awards: Second prize (out of 25,558 teams) in National Mathematical Modeling Contest, 2015
Projects
Develop a web application for furniture and rental information exchange
- Led team of four students to design and develop a web application with Node.js and Express
- Designed document schema on MongoDB and wrapped CRUD operations as RESTful APIs
- Implemented Login and user-specific functions such as account signup and authentication system
- Developed features for rental and furniture information exchange such as comments, search, dashboard
Develop a web application for furniture and rental information exchange
- Created and implemented recommender engine with users, items, and interaction records from JD.com
- Integrated multiple memory-based and model-based collaborative filtering algorithms to make recommendations
- Simulated on 7,000 pre-defined user-item interaction samples and attained 77% Top-10 Accuracy
Fintech Pitch Competition - 6th Position
Develop a mathematical model based on public data and metrics to measure and predict the vibrancy of the city in the U.S.
- Constructed a vibrancy index to interpret and predict the prosperity trend of cities in the U.S.
- Explored, collected, and blended data from Google POIs, Instagram, Zillow, Bureau of Labor Statistics with Pandas
- Performed feature engineering on panel data, and fine-tuned models for prediction with XGBoost and Keras
Data scraping, Parsing and information extraction, topic modeling with Clustering algorithms and Neural Networks
- Scraped, cleaned and structured Job Descriptions textual data from INDEED.com
- Filtered noisy data by selecting clusters from K-means algorithm and visualized skillsets distribution
- Categorized, analyzed skillsets trend after Topic Modeling and Aspect Extraction Deep Neural Network
Feature engineering, Data analysis and modeling
- Performed data cleaning and sentiment analysis for over 20 million related Tweets in second granularity
- Delivered feature engineering for financial indicators like MACD and RSI etc
- Analyzed, finetuned, and back-tested for over five types of deep learning model to increase the portfolio return
Familiar Python Packages
Data Scraping | Data Manupulation | Textual Data Processing | Machine Learning and Deep Learning | Data Visualization |
---|---|---|---|---|
Selenium, BeautifulSoup | Pandas, Numpy, Regular Expression | SpaCy, NLTK, Gensim | Pytorch, HuggingFace Transformers, Scikit-Learn, Keras | Matplotlib, Seaborn |
Modeling Knowledge Base
Neural Networks | NLP | Regression | Classification | Dimension Reduction | Clustering | Emsembling techniques |
---|---|---|---|---|---|---|
Transformer, Convolutional Neural Networks, Variational Autoencoder | Word2Vector, BERT, Transformer, Latent Dirichlet Allocation | Linear Regression, Hinge/Lasso Regression, Time series regression(ARIMA) | Support Vector Machine, Logistic Regression, Naive Bayes, Decision Tree, K-nearest Neighbors(KNN) | Principal Component Analysis(PCA), Singular Value Decomposition(SVD) | K-means, Hierarchical Clustering, DBSCAN | Bagging, Boosting, Stacking |
The sentences I love
[“What I cannot create, I do not understand”]