Hey there! Welcome to my page. I am currently working as a Senior Consultant - Data Science at Guidehouse, Inc. I am interested in
building scalable data science models and frameworks with an appetitie for DevOps and Engineering.
I have a strong background and interest in the field of Product Data Science, Machine Learning, mathematics and Statistics.
I am always on the lookout for exciting and impactful work in the industry as well as the non-profit sector.
I completed my Masters in Quantitative Finance from
Rutgers University in December 2019.
Before that, I completed my undergrad with a Bachelor of Engineering in Electronics and Telecommunication from Pune Institute of Computer Technology, Pune in 2013.
Email /
LinkedIN /
Github /
Resume
|
|
|
Senior Consultant - Data Science ( Guidehouse Inc ) November '20 - Present
Developing a robust and reusable next-gen risk XGBoost ML model pipeline to detect fraud and money laundering activities reaching 10M + customers with 100M + transactional data
Identified minimum thresholds for suspected financial crime/fraud using statistical methods, thereby reduced alerts on monthly basis by 40% and decreased the time required for Fin-Crime COPS to evaluate frauds/mules from a week to 12 hours
Introduced new fraud rules by decommissioning existing ones, thereby improved True Positive rate from 5% to 34% in final model
Gave meaningful insights about data. Prepared reports, written analyses, quantitative exhibits, & other client deliverables on time
|
|
Modeling Associate ( PwC ) March '20 - November '20
Extracted KYC and transaction data of 1.5 million accounts using SQL. Cleaned and transformed the raw data for further analysis
Developed a predictive analytics model to predict customers with high financial crime risk concern and evaluated best predictors using Logistic Regression and Anomaly Detection. Accuracy further improved by 8% using Regularization and Decision Trees
Automated the loan approval process by creating Natural Language Processing (NLP) tool that resulted in increased efficiency by saving 20 human hours per week and speeded up the approval cycle by 65%
|
|
Data Scientist Intern ( HSBC Bank USA N.A. ) February '20 - March '20
Developed a multilabel text classifier to identify different types of risk (12 types) associated with daily news using NLP techniques. Results were used by Auditors to classify news into risks, thereby reducing time effort by 60%
Utilized Beautiful Soup to scrape 40000 news articles, visualized the same on Word-Cloud to recognize the most common words
Realized a 15 % increase in accuracy over baseline with Naïve Bayes and Random Forest by leveraging custom tokenizer with TFIDF
|
|
Associate/Programmer Analyst ( Cognizant Technology Solutions ) January '14 - May '18
Created a forecasting model to estimate next day’s sales using time series (ARIMA) that helped client to know their inventory level
Improved model robustness by engineering new variables based on trend and seasonality to uncover the patterns of sales data, thereby reducing human efforts by 12 hours per week
Measured the effectiveness and ROI (Return on Investment) of various promotional channels in the mix to inform and aid brand strategy. Enhanced brand performance (~$3million) by determining optimal investments using Linear and Regularized Regression
Determined inter-tactic relationships between promotional channels that extended engagement to 2 additional geographies
Wrote sustainable SQL scripts & streamlined data validation process by automation to mitigate cross-functional dependency
Achievement: PILLAR of the Month Award twice from the client at Cognizant
|
|
Bank-Term-Deposit-Subscription (Link)
Developed a predictive model to classify whether customers will subscribe to Bank Term Deposit or not.
Improved model accuracy with parameter tuning by 8% using XG-Boost and identifie key drivers of Term Deposit subscription.
|
|
Loan Default Prediction (Link)
The assignment is to investigate and analyze the attached data, producing a model that can be used to predict whether a loan will be paid in full or charged off (this occurs when the loan is unpaid or delinquent for a period of time so the financial institution determines that the debt is unlikely to be collected).
ML models were implemented with different sampling techniques and desired results were achieved. [Link to Results]
Future versions of the model intend to include Customer Segmentation for better precision and recall.
|
|
Telecom-Customer-Churn-Prediction (Link)
Built a predictive model by using machile learning techniques in order to prevent the loss of clients/customers for telecom industry.
Explored distribution of customers by different parameters such as gender, phone servives, dependents, internet services etc. and identified key factors affecting customer churn.
|
|
Prudential Life Insurance Risk Assessment (Link)
The motivation behind the project is to develop a predictive model that accurately classifies risk with a more automated approach using machine learning multi-classification algorithms. Accurate risk assessment is very important for the insurance firms, since the downside risk is extremely large in case, they misclassify the customer.
Performed exploratory data analysis and followed two different approaches to get desired results by implement various ML models.
This project is a school group project and one important thing we learnt is that we beliebed there is some hierarchy among algorithms in terms of accuracy, but that is not always true. There are datasets where a linear model can beat ensemble techniques.
|
Achievements and Volunteer Experience
|
|
'PILLAR of the Month' Award twice from the client at Cognizant.
Stood 2nd in HSC(12 std) examination among 1,00,000 students and as a result, received Central Sector Scholarship throughout 4 years of engineering.
Worked with Outreach and Sponsorship teams to raise awareness.
|
| |