Sirion Autoextraction
Responsibilities:
- Responsible for collecting relevant data, developing system models, prediction algorithms, solutions to prescriptive analytics problems, data mining techniques, and/ or econometric models.
- Communicate the results with the operations team for taking the best decisions and Collect data needs and requirements by Interacting with the other departments.
- Performed data cleaning using pandas ,numpy Python and Regex applied some business customized stop word removal, pattern replacement after performing multiple experiments and EDA experimented on multiple token split windows for tokenizing text of length more than 512 words
- Experimented with multiple activation function for hidden units of bert like swish,gelu ,relu
- Experimented with multiple pretrarained model like bert, albert, xlnet
- Experimented with approaches like text translation and reverse translation for balancing skewed classes maintained data uniformity by checking similarity with overlapping category, text features of below certain length threshold with multiple techniques of text extraction like question ,ner for information retrieval
- Developed service for auto extraction of contract documents using NLP classification, NER, and question answering models using state of art bert embedding .
- Designed service for same using flask, pulsar, and python
- Formed accuracy algorithm using fuzzy match for NER Model
- Supervised model performance over time and updated when needed
- Composed docker for seamless deployment of models
- Updated old models, increased extraction efficiency by 70 percent
Environment: Python 3.x, Linux, Spark, Microsoft Excel, spacy, nltk, transformers, dash, plotly
Web application for training automation
Responsibilities:
- Automated Nlp Training Process
- Created Ui Trainer using Dash and python
- Stored metadata using mongo db,
Automation of inventory management system
Responsibilities:
- Responsible for developing and deploying risk-based decision tools and building knowledge-based systems to solve large scale computational problems
- Communicate the results with the operations team for taking best decisions and Collect data needs and requirements by Interacting with the other departments
- Worked on data cleaning and ensured data quality, consistency, integrity using Pandas, NumPy.
- Used Statistical methods like missing value analysis, outlier detection feature selection for creating relevant data for machine learning model
- Experimented and built predictive models including ensemble methods such as Question Answering CERTIFICATIONS Udacity Deep Learning Certification Udacity NLP Nanodegree Simplilearn Statstics Gradient boosting trees,extreme gradient boosting.
- Reduced Dimension of data by merging multiple data together
- Created Dashboard using sky wise slate
- Created regression model for EDT of inventories using Xgboost
- Delivered insights causing delay of part delivery, using feature weight interpretation
- boosted part delivery by 30 percent
Sell forecasting of flight part
Responsibilities:
- Responsible for developing and deploying time series based forecasting model
- Communicate the results with the operations team for taking best decisions and Collect data needs and requirements by Interacting with the other departments
- Worked on data cleaning and ensured data quality, consistency, integrity using Pandas, NumPy.
- prepared data for sequence modeling
- trained and deployed forecasting based LSTM model to forecast future sales of next one year
- provisioned online training based on data drift and concept drift
Concession Classification
Responsibilities:
- Collected multiple Datasets
- Cleaned and preproceed Dataset using numpy pandas,regex
- Ensured consistency and imntegrity of dataset
- Created concession classification model using lstm and word2vec
- Reduced human intervention up to 80 percent
Insurance reassignment classification
Responsibilities:
- Developed consolidated Dataset from 35 DB Parquet files using spark on AWS EMR
- Done extensive EDA on data using Python, seaborn, and pandas profiling
- Selected features using forward elimination process and PCA
- Created Reassignment Insurance Classifier using catboost
- Deployed Model on Production Using Docker
- analyzed feature causing reassignment of the claims
Trainee Performance Predictor
Responsibilities:
- Curated Dataset from ILP Database
- Created features using aggregation
- Selected features using RFE
- Created Model using Xgboost Classifier
- Deployed model in production using flask and gunicorn
- delivered several insights using feature interpretation like feature causing students to fail and areas of improvement.