Kathan P

Summary:

Results-driven Generative AI and Python Developer with 5+ years of experience designing and delivering end-to-end, data-driven solutions. Highly skilled in Generative AI, machine learning, and Python-based development, with hands-on expertise in building and integrating AI models, developing scalable APIs using Flask, and deploying intelligent applications. Proven ability to connect AI solutions with platforms such as Google Chat, Facebook, WhatsApp, and Instagram, while driving business impact through automation, proof-of-concepts, and production-ready Generative AI tools.

Professional Snapshot:

Programming & Frameworks: Python, R Programming, Flask, Streamlit, FastAPI, TypeScript, JavaScript, Angular
Artificial Intelligence & Machine Learning: Generative AI (OpenAI, LLaMA, LLaVA, Gemini), Natural Language Processing (LangChain, Hugging Face, LlamaIndex, CrewAI), Deep Learning (TensorFlow, Keras, PyTorch)
Data Processing & Visualization: Pandas, NumPy, Matplotlib, Power BI, Time Series Analysis, Statistical Modeling
Database Management: Vector Databases (Pinecone, Weaviate), Relational Databases (PostgreSQL, MySQL), NoSQL Databases (MongoDB)
Web Development & Automation: Docker, Redis, GitHub, CI/CD Pipelines, Selenium, BeautifulSoup, RESTful API Development
Cloud & Deployment: AWS, Microsoft Azure, Google Cloud Platform (GCP), Heroku, Docker Containers

Skills:

AI Application Development and Customization
Generative AI, LLM Integration, and Prompt Engineering
Python, R, FastAPI, Flask, Streamlit, TypeScript, JavaScript
Frontend for AI Apps: HTML, CSS, JavaScript, Angular, React (AI Dashboards & Interfaces)
Model Development & Fine-Tuning: TensorFlow, PyTorch, Keras
NLP & Agent Frameworks: LangChain, Hugging Face, LlamaIndex, CrewAI
AI APIs & Services: OpenAI, Gemini, LLaMA, REST APIs, Web Services
Databases & Vector Stores: PostgreSQL, MySQL, MongoDB, Pinecone, Weaviate, FAISS
Data Processing & Feature Engineering: Pandas, NumPy, Data Cleaning, Data Migration
MLOps & Deployment: Docker, CI/CD Pipelines, Model Serving, Monitoring
Cloud Platforms: AWS, Microsoft Azure, Google Cloud Platform (GCP)
AI Automation & Integrations: Webhooks, n8n, Third-party API Integrations
Documentation & Standards: Model Documentation, UML Diagrams, Coding Standards

Work Experience:

Designed and implemented end-to-end AI and Generative AI solutions including LLM-based, RAG, and multi-agent systems
Built and fine-tuned machine learning and deep learning models for NLP, computer vision, speech, and multimodal applications
Developed AI-powered automation workflows for hiring, education, document processing, and attendance systems
Implemented data pipelines, vector databases, and semantic search for scalable AI applications
Integrated AI models with web applications using REST APIs, Flask, and FastAPI
Collaborated with cross-functional teams to deploy AI solutions on cloud platforms (AWS, Azure, GCP)
Optimized model performance, scalability, cost, and reliability across production environments

Projects:

AI Hiring Automation Team

Description: Developed an AI-powered system that streamlines the hiring process by automating candidate screening, resume analysis, and preliminary interviews. Utilizing a multi-agent architecture built with CrewAI and Large Language Models (LLMs), the system autonomously evaluates applicant qualifications, generates personalized assessment questions, and provides comprehensive candidate summaries for hiring managers. This solution significantly reduces manual review time while ensuring consistent and unbiased candidate evaluation.

Key Achievements:

Automated end-to-end candidate screening and preliminary assessment process.
Created personalized interview experiences through dynamic question generation.
Delivered comprehensive candidate evaluations with detailed strength and weakness analysis.

Technologies Used: Python, CrewAI, LLMs, Natural Language Processing, Angular, JavaScript

AI-Powered Teaching Faculty

Description: Engineered an intelligent virtual teaching assistant that leverages PhiData architecture and Large Language Models to deliver personalized educational content, interactive lessons, and real-time student assessment. The system adapts teaching methodologies based on learning progress, generates customized learning materials, and provides detailed feedback on student performance. This virtual faculty member can handle diverse subjects, respond to student queries, and maintain engaging educational interactions through natural language understanding.response times, and ensure a seamless customer experience.

Key Achievements:

Implemented adaptive learning algorithms that personalize content delivery based on student performance.
Developed automated assessment capabilities with detailed performance analytics.
Created an intuitive natural language interface for student-faculty interactions.

Technologies Used: Python, PhiData, LLMs, Natural Language Understanding, Typescript

Video & Image RAG Engine

Description: Designed and implemented a sophisticated Retrieval-Augmented Generation (RAG) system specifically optimized for video and image content analysis. Built on LangChain architecture and integrated with advanced LLMs, the engine can process, index, and retrieve information from visual media, enabling users to query video content, extract key insights, and generate comprehensive summaries. The system incorporates visual recognition capabilities alongside semantic understanding to provide context-aware responses based on both visual and auditory elements of multimedia content.

Key Achievements:

Developed a robust indexing system for efficient retrieval from large multimedia databases.
Implemented multimodal understanding to process both visual and audio components of video content.
Created an intuitive query interface that accepts natural language questions about visual content.

Technologies Used: Python, LangChain, LLMs, Computer Vision, Vector Databases

Speech Emotion Recognition

Description: Developed a CNN-based model that accurately recognizes emotions from speech signals by leveraging a diverse dataset. The system classifies emotions such as neutral, happy, sad, angry, fear, disgust, and surprise. It is designed for applications in human-robot interaction, speech therapy, and mental health diagnostics. Key features extracted from audio signals---including pitch, volume, and energy---are used to enhance recognition performance.

Key Achievements:

Achieved high accuracy in classifying speech into multiple emotion categories using deep learning.
Effectively extracted and utilized key audio features to improve model performance and reliability.

Technologies Used: TensorFlow, NumPy

Subtitle Translate

Description: This project is a web application that enables users to upload audio or video files (MP3/MP4) and automatically transcribe the speech using the Google Speech-to-Text API. The resulting transcription is then translated into a target language utilizing the Helsinki-NLP/opus-mt model. Built with Flask and deployed locally, the application allows users to download both the transcription and translation as text files. It is particularly useful for individuals and professionals who need quick, accurate transcription and translation of multimedia content.

Key Achievements:

Seamlessly transcribes and translates audio/video content into multiple languages including Spanish, French, German, Hindi, Arabic, Chinese, Russian, Portuguese, and Urdu.
Provides a user-friendly interface for efficient upload, processing, and download of subtitles.

Technologies Used: Flask, Hugging Face, Google Speech-to-Text API, Helsinki-NLP/opus-mt, LLM

Automated Attendance System Using Face Recognition

Description: Engineered a comprehensive attendance management system utilizing advanced face recognition technologies to automate student or employee attendance tracking. The system captures real-time video feeds, detects faces, and matches them against a pre-registered database to mark attendance without manual intervention. Features include anti-spoofing measures, attendance report generation, and notification alerts for management. The solution significantly reduces administrative overhead while improving attendance accuracy and eliminating proxy attendance issues.

Key Achievements:

Implemented robust face detection and recognition with high accuracy in varied lighting conditions.
Developed a user-friendly interface for system administration and attendance reporting.
Integrated security features to prevent spoofing and unauthorized access.

Technologies Used: Python, OpenCV, Deep Learning, Face Recognition Algorithms, Database Management

BotaniScan

Description: Developed and fine-tuned a YOLOv8 model for robust plant and leaf detection in agriculture and ecological applications. Leveraging the DIMPSAR dataset, which comprises 24 classes of Indian medicinal plants and leaves, the model achieves high mAP50, precision, and recall during testing. Enhanced accuracy and processing speed were attained through optimized YOLOv8 anchors, enabling real-time detection and classification in video feeds.

Key Achievements:

Successfully detects and classifies 24 distinct plant species and leaf types in video streams.
Achieved high performance metrics in terms of mAP50, precision, and recall, demonstrating robust plant recognition capabilities.

Technologies Used: YOLOv8, Object Detection, Fine-Tuning, DIMPSAR Dataset

Docvision

Description: Developed an advanced automated invoice text extraction system, "Docvision," that combines YOLOv8-based object detection with OCR (Tesseract) to accurately identify and extract key information from invoices. The system effectively detects customer details, invoice information, product listings, subtotals, tax, and total amounts. Integrated with a user-friendly Flask web application, it allows users to upload invoice images and visualize extracted text in real time. Performance is rigorously evaluated using metrics such as precision, recall, mAP50, and mAP50-95.

Key Achievements:

Accurately detects and extracts critical invoice components including customer details, invoice information, products, subtotal, tax, and total amounts from diverse invoice formats.

Technologies Used: YOLOv8, OCR, Tesseract OCR, Fine-Tuning, Flask

Real-Time YOLO Object Detection System

Description: Developed a robust real-time object detection system utilizing the YOLOv8 model to identify 80 object classes from the COCO dataset. The system integrates OpenCV to process live webcam feeds and video files, and features a custom GUI that displays detected objects with bounding boxes and class labels in real time. This solution ensures efficient and accurate detection for interactive applications.

Key Achievements:

Achieved real-time detection and classification of 80 COCO classes with high accuracy and low latency.
Designed an intuitive GUI for seamless visualization of detected objects on live video inputs.

Technologies Used: YOLOv8, OpenCV

LinkedIn Profile & Udemy Course Scraper

Description: Developed an automated web scraping tool designed to extract and analyze professional data from LinkedIn profiles and educational content from Udemy courses. Using Selenium and BeautifulSoup, the system navigates through LinkedIn's complex DOM structure to gather career trajectories, skills, and professional networks, while simultaneously cataloging Udemy course content, reviews, and pricing information. The collected data is structured, normalized, and stored for talent acquisition, market research, and educational content analysis.

Key Achievements:

Created a robust web scraping framework capable of handling JavaScript-heavy websites.
Implemented intelligent data extraction algorithms that adapt to varying page structures.
Developed comprehensive data processing pipelines for cleaning and normalizing collected information.

Technologies Used: Python, Selenium, Beautiful Soup, Data Processing, Web Scraping

Social Share

This element will not be visible on your live website - it works in the background to help protect your content.

How it Works

Get A Quote

KNOW

SEND

LIKE

SEND

ON BOARD

How it Works

Get A Quote

1. SEND

2. MATCH

3. TRIAL

4. ON BOARD

Meet us in person at CES | LVCC - South Hall 3 Booth No. 40423/12 | Las Vegas | 7-10 Jan 2025 | Book Your Meeting Now!

​

​

Summary

Technical Skills

Projects Worked On

​

Summary:

Professional Snapshot:

Skills:

Work Experience:

Projects:

AI Hiring Automation Team

AI-Powered Teaching Faculty

Video & Image RAG Engine

Speech Emotion Recognition

Subtitle Translate

Automated Attendance System Using Face Recognition

BotaniScan

Docvision

Real-Time YOLO Object Detection System

LinkedIn Profile & Udemy Course Scraper

Social Share

How it Works

KNOW

SEND

LIKE

SEND

ON BOARD

How it Works

1.

SEND

2.

MATCH

3.

TRIAL

4.

ON BOARD