About Me
My experience has instilled in me a strong desire to solve difficult challenges with innovative data-driven solutions. I’m enthusiastic about projects that benefit the community, and I’m eager to assist businesses in developing better products by using data to derive insights.
Stacks:
- Programming Languages: Python, Java, JavaScript, Shell Script, SQL
- Database Management Tools: ElasticSearch, Weaviate MySQL, Microsoft SQL Server, MongoDB
- Technology: RAG, Microsoft Azure, Nginx, Adobe Experience Manager, MuleSoft, Docker, AWS, Git, Tableau
- Python Packages: Pandas, NumPy, Re, Pytorch, Transformers, Scikit-learn
Experience
PiSrc
Full Stack Software Engineer
February 2022 - Present
Gain hands-on experience on software development and data pipeline implementation
- AI Chatbot & RAG Systems:
- Led full-cycle development of a production-grade RAG conversational AI chatbot using Azure OpenAI, Redis, and Weaviate vector database; architected sticky session routing and load balancing to support 1K+ DAUs
- Designed multi-turn conversational memory with Redis persistence and context window management; built agentic workflows with function calling to orchestrate custom tools and external APIs
- Implemented security guardrails including input sanitization, content filtering, and PII masking to ensure safe and compliant AI interactions
- Delivered real-time AI Overview and autosuggest powered by live user queries with caching layer for low-latency responses; built scheduled report pipelines and user feedback loops to continuously improve relevance and quality
- Engineered scheduled multilingual (I18N) indexing pipelines across 10+ heterogeneous data sources, combining keyword-based and semantic hybrid search with semantic reranking, multi-channel query routing, query expansion, and iterative retrieval
- Infrastructure & Platform Engineering:
- Architected scalable full-stack infrastructure: VM provisioning, runtime orchestration, and offline pipelines for knowledge base synchronization and cache optimization
- Maintained high reliability and low-latency performance through proactive monitoring and tuning
- Applied data-driven insights to evolve CMS architecture and scale web platforms
- Data Integration & Search Optimization:
- Designed 7+ MuleSoft API integration workflows for incremental delta updates of partner accounts and locations
- Integrated multi-source data into a Solr and Elasticsearch–based search layer with cache optimization
- Personalization & Machine Learning Pipelines:
- Developed User-to-Item and Item-to-Item recommendation pipelines with rolling cache and Akamai CDN integration
- Built offline ML pipelines to optimize sales funnel conversion and marketing campaign targeting
- Content Platform (AEM):
- Developed licensable software modules on Adobe Experience Manager to streamline content authoring and multi-channel publishing
Stevens Institute of Technology
Research Assistant/Teaching Assistant, Deep Learning and Web Analytics
June 2020 - December 2021
Language features pattern detection with Deep Learning models, data parsing and statistics modeling
- Cleaned and structured 94,581 earnings call transcripts to explore 96 language factors influencing stock return
- Achieved 72% accuracy on text classification with domain-adapted BERT language model using Pytorch
- Conducted text analysis, sentiment analysis, and text mining utilizing NLP techniques with SpaCy and NLTK
- Set up and managed a remote GPU environment on Ubuntu for Machine Learning and Deep Learning tasks
- Developed tutorials on implementing deep learning models using related Python packages
Education
Stevens Institute of Technology
MSc in Data Science (GPA 3.8/4.0)
September 2019 - December 2021
Relevant Coursework: Statistical Methods, Statistical Inference, Advanced Optimization Methods, Advanced Data Analytics & Machine Learning, Deep Learning, Natural Language Processing, Web Analytics, Database Management Systems, Web Programming, Data Structures & Algorithms
Scholarship: Provost’s Scholarship, 2019
Guangzhou University
BSc in Mathematics and Applied Mathematics
September 2014 - May 2018
Relevant Coursework: Probability and Mathematical Statistics, Operational Research, Numerical Analysis, Advanced Algebra, Mathematical Analysis, Real Function Theory, Functional analysis, Ordinary Differential Equations, Partial Differential Equations
Awards: Second prize (top 6.3% out of 25,558 teams) in National Mathematical Modeling Contest, 2015
Projects
Develop a web application for furniture and rental information exchange
- Led team of four students to design and develop a web application with Node.js and Express
- Designed document schema on MongoDB and wrapped CRUD operations as RESTful APIs
- Implemented Login and user-specific functions such as account signup and authentication system
- Developed features for rental and furniture information exchange such as comments, search, dashboard
Develop a web application for furniture and rental information exchange
- Created and implemented recommender engine with users, items, and interaction records from JD.com
- Integrated multiple memory-based and model-based collaborative filtering algorithms to make recommendations
- Simulated on 7,000 pre-defined user-item interaction samples and attained 77% Top-10 Accuracy
Fintech Pitch Competition - 6th Position
Develop a mathematical model based on public data and metrics to measure and predict the vibrancy of the city in the U.S.
- Constructed a vibrancy index to interpret and predict the prosperity trend of cities in the U.S.
- Explored, collected, and blended data from Google POIs, Instagram, Zillow, Bureau of Labor Statistics with Pandas
- Performed feature engineering on panel data, and fine-tuned models for prediction with XGBoost and Keras
Data scraping, Parsing and information extraction, topic modeling with Clustering algorithms and Neural Networks
- Scraped, cleaned and structured Job Descriptions textual data from INDEED.com
- Filtered noisy data by selecting clusters from K-means algorithm and visualized skillsets distribution
- Categorized, analyzed skillsets trend after Topic Modeling and Aspect Extraction Deep Neural Network
Feature engineering, Data analysis and modeling
- Performed data cleaning and sentiment analysis for over 20 million related Tweets in second granularity
- Delivered feature engineering for financial indicators like MACD and RSI etc
- Analyzed, finetuned, and back-tested for over five types of deep learning model to increase the portfolio return
Familiar Python Packages
| Data Scraping | Data Manupulation | Textual Data Processing | Machine Learning and Deep Learning | Data Visualization |
|---|---|---|---|---|
| Selenium, BeautifulSoup | Pandas, Numpy, Regular Expression | SpaCy, NLTK, Gensim | Pytorch, HuggingFace Transformers, Scikit-Learn, Keras | Matplotlib, Seaborn |
Modeling Knowledge Base
| Neural Networks | NLP | Regression | Classification | Dimension Reduction | Clustering | Emsembling techniques |
|---|---|---|---|---|---|---|
| Transformer, Convolutional Neural Networks, Variational Autoencoder | Word2Vector, BERT, Transformer, Latent Dirichlet Allocation | Linear Regression, Hinge/Lasso Regression, Time series regression(ARIMA) | Support Vector Machine, Logistic Regression, Naive Bayes, Decision Tree, K-nearest Neighbors(KNN) | Principal Component Analysis(PCA), Singular Value Decomposition(SVD) | K-means, Hierarchical Clustering, DBSCAN | Bagging, Boosting, Stacking |
The sentences I love
[“What I cannot create, I do not understand”]