Data Science Projects
Machine learning models and predictive analytics solutions
Data Cleaning Assistant
A Python-based automation tool that helps clean and preprocess datasets efficiently, reducing manual effort and ensuring consistency in data preparation for analysis.
Read more
This project provides a set of tools for cleaning messy datasets, including handling missing values, standardizing formats, removing duplicates, and transforming columns for analysis-ready data.
- Features: Automated data cleaning functions, duplicate handling, normalization, and data validation.
- Tools: Python (Pandas, NumPy), JSON for configuration and user settings.
- Use Case: Prepares large datasets for machine learning, visualization, or reporting without manual intervention.
- Outcome: Saves time and reduces errors in preprocessing, allowing data scientists to focus on analysis and modeling.
Cyclistic Bike-Share Analysis
End-to-end data workflow using SQL for preparation and Python for EDA to compare how casual riders and annual members use the service.
Read more
Follows the Google capstone's six-step framework (Ask-Prepare-Process-Analyze-Share-Act) to quantify behavioral differences between rider segments and translate findings into membership growth strategies.
- Tools: PostgreSQL for data prep, Python (Pandas/Seaborn/Plotly) for analysis, Tableau for dashboards.
- Focus: Ride duration, day-type patterns, bike types, station hotspots, and seasonal trends.
- Outcome: Marketing and timing recommendations for converting casual riders to members.
- Key Insights: Casual riders concentrate on weekends with longer rides, while members use bikes for weekday commuting.
Salifort Motors - Employee Attrition
Classification workflow using HR data to predict which employees are likely to leave and to surface actionable drivers for retention.
Read more
Builds and evaluates classification models on HR features (projects, tenure, hours, evaluations) to flag employees likely to leave and inform policy changes.
- Business Goal: Reduce attrition cost and improve satisfaction through proactive retention strategies.
- Methods: Data wrangling and modeling in Jupyter notebooks with supervised classification approach.
- Key Features: Number of projects, years at company, monthly hours worked, evaluation scores.
- Recommendations: Limit concurrent projects to 3-4 (max 5), promote after ~4 years, clarify overtime expectations, discuss culture, reward work proportionally.
Automatidata - Predicting Taxi Gratuities
NYC Yellow Taxi tipping classification using trip features (duration, distance, fare) to predict generous tips (>20%) with machine learning models.
Read more
This project builds multiple models (including random forest) on NYC Yellow Taxi 2017 trip records to classify whether a rider will tip generously, focusing on interpretable trip features.
- Data: 2017 Yellow Taxi trip records (~408k trips) with timestamps, locations, distance, fares, payment types.
- Features: Trip duration, distance, total/fare amounts, vendor/payment type; top importance for duration, distance, cost.
- Modeling: Random forest classification with 86% accuracy and 72% precision for generous tipping prediction.
- Use Cases: Inform driver expectations for tip likelihood and explore pricing/route patterns correlating with higher gratuities.
Advanced ML Projects Coming Soon
Advanced machine learning projects featuring deep learning, NLP, and computer vision applications are currently in development.
Machine Learning Expertise
Specialized in end-to-end ML solutions from data preprocessing to model deployment
Supervised Learning
Classification and regression models using scikit-learn, XGBoost, and ensemble methods
Feature Engineering
Advanced data preprocessing, feature selection, and dimensionality reduction techniques
Model Optimization
Hyperparameter tuning, cross-validation, and performance evaluation strategies
Python Ecosystem
Pandas, NumPy, scikit-learn, Matplotlib, Seaborn, and Jupyter for data science workflows
Tools & Technologies
Programming
Machine Learning
Data Analysis
Platforms
Interested in Data Science Solutions?
Looking for machine learning models, predictive analytics, or data science consulting? Let's discuss your project requirements.
Get in Touch