Wei Yang

Full-Stack Software Engineer at PiSrc

Email: hey.weiyang@gmail.com

Phone: +1 551 260 0541

Web: inscribedeeper.github.io/

About Me

My experience has instilled in me a strong desire to solve difficult challenges with innovative data-driven solutions. I’m enthusiastic about projects that benefit the community, and I’m eager to assist businesses in developing better products by using data to derive insights.

Stacks:

  • Programming Languages: Python, Java, JavaScript, Shell Script, SQL
  • Database Management Tools: PostgreSQL, MySQL, Microsoft SQL Server, MongoDB, Weaviate
  • Technology: Adobe Experience Manager(AEM), MuleSoft, Docker, AWS, Jupyter Notebook, Git, Tableau
  • Python Packages: Pandas, NumPy, Scikit-learn, Pytorch, Transformers, SpaCy, Gensim, NLTK, Selenium, Re

Experience

PiSrc

Full Stack Software Engineer

February 2022 - Present

Gain hands-on experience on software development and data pipeline implementation

  • Integrated Weaviate Vector Database with LLMs to empower company-specific content-driven recommendations and elevate chatbot conversations
  • Orchestrated 7 MuleSoft APIs, streamlining delta updates ingestion into Lucidworks Fusion for the Partner Locator, ensuring a seamless user experience.
  • Engineered data pipelines infused with Machine Learning algorithms, driving personalized user experiences and optimizing marketing conversion rates
  • Tailored AEM components to deliver customized questionnaires, complete with category support and robust reporting functionalities

Stevens Institute of Technology

Research Assistant/Teaching Assistant, Deep Learning and Web Analytics

June 2020 - December 2021

Language features pattern detection with Deep Learning models, data parsing and statistics modeling

  • Cleaned and structured 94,581 earnings call transcripts to explore 96 language factors influencing stock return
  • Achieved 72% accuracy on text classification with domain-adapted BERT language model using Pytorch
  • Conducted text analysis, sentiment analysis, and text mining utilizing NLP techniques with SpaCy and NLTK
  • Set up and maintained remote GPU on Ubuntu for Machine Learning and Deep Learning tasks
  • Developed tutorials on implementing deep learning models using related Python packages

1lift

Data Analyst Intern

April 2019 - July 2019

Gain hands-on experience on necessary tools and understanding of develop environment

  • Built a cost analysis module for the application with Excel advanced functionalities
  • Extracted structured data with MySQL, cleaned the unstructured textual data with Python
  • Manipulated, visualized and presented the statistical results with Tableau Dashboard
  • Explored regression models to provide a baseline of elevators’ failure rate for the Operations Department

Education

Stevens Institute of Technology

MSc in Data Science (GPA 3.8/4.0)

September 2019 - December 2021

Relevant Coursework: Statistical Methods, Statistical Inference, Advanced Optimization Methods, Advanced Data Analytics & Machine Learning, Deep Learning, Natural Language Processing, Web Analytics, Database Management Systems, Web Programming, Data Structures & Algorithms

Scholarship: Provost’s Scholarship, 2019

Guangzhou University

BSc in Mathematics and Applied Mathematics

September 2014 - May 2018

Relevant Coursework: Probability and Mathematical Statistics, Operational Research, Numerical Analysis, Advanced Algebra, Mathematical Analysis, Real Function Theory, Functional analysis, Ordinary Differential Equations, Partial Differential Equations

Awards: Second prize (out of 25,558 teams) in National Mathematical Modeling Contest, 2015

Projects

MyPlace Web Development

Develop a web application for furniture and rental information exchange

  • Led team of four students to design and develop a web application with Node.js and Express
  • Designed document schema on MongoDB and wrapped CRUD operations as RESTful APIs
  • Implemented Login and user-specific functions such as account signup and authentication system
  • Developed features for rental and furniture information exchange such as comments, search, dashboard

E-commerce Recommender System

Develop a web application for furniture and rental information exchange

  • Created and implemented recommender engine with users, items, and interaction records from JD.com
  • Integrated multiple memory-based and model-based collaborative filtering algorithms to make recommendations
  • Simulated on 7,000 pre-defined user-item interaction samples and attained 77% Top-10 Accuracy

Fintech Pitch Competition - 6th Position

Develop a mathematical model based on public data and metrics to measure and predict the vibrancy of the city in the U.S.

  • Constructed a vibrancy index to interpret and predict the prosperity trend of cities in the U.S.
  • Explored, collected, and blended data from Google POIs, Instagram, Zillow, Bureau of Labor Statistics with Pandas
  • Performed feature engineering on panel data, and fine-tuned models for prediction with XGBoost and Keras

Quantify the AI impacts on Jobs skills

Data scraping, Parsing and information extraction, topic modeling with Clustering algorithms and Neural Networks

  • Scraped, cleaned and structured Job Descriptions textual data from INDEED.com
  • Filtered noisy data by selecting clusters from K-means algorithm and visualized skillsets distribution
  • Categorized, analyzed skillsets trend after Topic Modeling and Aspect Extraction Deep Neural Network

Analysis on Factors Influencing Bitcoin

Feature engineering, Data analysis and modeling

  • Performed data cleaning and sentiment analysis for over 20 million related Tweets in second granularity
  • Delivered feature engineering for financial indicators like MACD and RSI etc
  • Analyzed, finetuned, and back-tested for over five types of deep learning model to increase the portfolio return

Familiar Python Packages

Data Scraping Data Manupulation Textual Data Processing Machine Learning and Deep Learning Data Visualization
Selenium, BeautifulSoup Pandas, Numpy, Regular Expression SpaCy, NLTK, Gensim Pytorch, HuggingFace Transformers, Scikit-Learn, Keras Matplotlib, Seaborn

Modeling Knowledge Base

Neural Networks NLP Regression Classification Dimension Reduction Clustering Emsembling techniques
Transformer, Convolutional Neural Networks, Variational Autoencoder Word2Vector, BERT, Transformer, Latent Dirichlet Allocation Linear Regression, Hinge/Lasso Regression, Time series regression(ARIMA) Support Vector Machine, Logistic Regression, Naive Bayes, Decision Tree, K-nearest Neighbors(KNN) Principal Component Analysis(PCA), Singular Value Decomposition(SVD) K-means, Hierarchical Clustering, DBSCAN Bagging, Boosting, Stacking

The sentences I love

[“What I cannot create, I do not understand”]