Traditional career counseling is often generic, failing to keep pace with the dynamic job market or account for a student's unique combination of aptitude and academic performance. We wanted a purely data-driven approach.
My team and I built SkillSet Sherpa, an AI-powered system that reads student marksheets using Optical Character Recognition (OCR) and combines it with a psychometric test to generate highly personalized LLM recommendations. This research was published in Springer's Lecture Notes in Networks and Systems.
The Architecture
Tech Stack
- > Backend: Python, Flask
- > Computer Vision: EasyOCR (CNN-based)
- > Data Processing: Pandas, Openpyxl
- > AI: LLM integration & Prompt Engineering
The OCR Pipeline
Extracting structured tabular data from messy, scanned marksheets is notoriously difficult. We benchmarked several models-including Tesseract, PaddleOCR, and Doctr-before selecting EasyOCR for its optimal balance of accuracy and integration ease.
Once a user uploads a scan via our Flask backend, the CNN-based EasyOCR model reads the image. We then use openpyxl and pandas to clean up the extracted text, map the headers, and save the structured grades into a clean CSV format for analysis.
Psychometrics: The RIASEC Model
Grades only tell half the story. We implemented the Holland Codes (RIASEC) test to quantify a user's affinity for Realistic, Investigative, Artistic, Social, Enterprising, and Conventional work environments. The backend ingests the raw survey inputs and converts them into normalized percentage scores, creating a mathematical profile of the user's psychological strengths.
LLM Orchestration & Prompt Engineering
The final step is where the system actually "thinks". The pipeline dynamically injects both the academic CSV data and the normalized RIASEC percentages into a highly structured prompt.
By forcing the LLM to consider these two distinct datasets simultaneously, it outputs highly specific career paths-and justifies its reasoning by explicitly mapping the required educational streams (e.g., advising a student to major in English or Mass Communication if they scored high in Artistic/Social traits and excelled in language subjects).