Deepanshu Katariya

Aspiring Data Scientist | Machine Learning Engineer
Dehradun, IN.

About

Highly motivated and detail-oriented Computer Science Engineering student with a strong foundation in data analysis, machine learning, and software development. Possessing hands-on experience in data annotation for Large Language Models and developing robust data pipelines and web applications. Eager to leverage analytical skills and technical expertise to contribute to innovative data science and machine learning initiatives.

Work

Sunix AI
|

Data Annotator Intern

Summary

Contributed to critical data annotation processes for AI model development, focusing on enhancing data quality and relevance for machine learning applications.

Highlights

Annotated and tagged video data to generate high-quality, structured input for Large Language Model (LLM) training, ensuring data accuracy and integrity.

Collaborated with cross-functional teams to standardize data annotation practices, ensuring consistency across diverse video datasets.

Optimized data annotation workflows, significantly improving the quality and relevance of data for machine learning model training.

Managed large volumes of complex video data, developing a keen attention to detail and a comprehensive understanding of end-to-end machine learning workflows.

Education

Uttarakhand Technical University

Bachelor of Technology

Computer Science Engineering

Languages

English

Fluent

Hindi

Native

Skills

Programming Languages

Python, C/C++, SQL, JavaScript.

Frameworks

Django, FastAPI, Flask, Flask-RESTful.

Databases

MySQL, PostgreSQL.

Developer Tools

Git, Jupyter Notebook, Microsoft Excel, SQL Server Management Studio, Visual Studio Code, Google Looker Studio, Tableau, Microsoft Power BI, Docker, Postman.

Data Science & Machine Learning

Data Analysis, Data Warehousing, Data Science, Machine Learning, Artificial Intelligence, GenAI, Computer Vision, TensorFlow, Scikit-learn, LightGBM, Pandas, NumPy, Matplotlib, Seaborn, PySpark.

Cloud Platforms

Azure Data Lake Gen 2, Azure Databricks, Azure Synapse Analytics, Azure Data Factory.

Software Development & Methodologies

Design Patterns, Frontend Development, Unit Testing, Computer Architecture, Problem Solving, Computer Networks, Network Security, Object-Oriented Programming, Database Management System (DBMS), Statistics, Analytical Thinking, ETL (Extract, Transform and Load), Data Analytics, Data Visualization, Version Control, CI/CD, Software Development Life Cycle, Algorithms, Debugging, Code Refactoring.

Projects

Sentiment Analysis on Tweets

Summary

Developed a robust sentiment analysis model for airline tweets using Python, leveraging a comprehensive machine learning pipeline.

Olympic Data Analysis

Summary

Designed and implemented an end-to-end data pipeline and analytics solution for Olympic data using Azure cloud services and PySpark.

Django CSV Analysis Web Application

Summary

Developed a full-stack Django web application for interactive analysis and visualization of CSV files, enhancing data accessibility and insights.