Notes from the Build Process
Hi I'm Ketaki.
"Mere data makes a man." Probably the most haunting line in the whole movie and also accidentally the most accurate description of what I study. CS grad student here and this is where I write about the things I build and what they end up teaching me.
The line comes from this exchange in Blade Runner 2049:
Joi: Mere data makes a man. A and C and T and G. The alphabet of you. All from four symbols. I am only two: 1 and 0.
'K': Half as much but twice as elegant, sweetheart.
An AI telling a human he's just data. In 2049, that read as poetry. In 2025, it reads as a product roadmap. What I find interesting isn't the reductionism, it's the confidence. Joi doesn't say "you could be described as data." She says you are it. Four symbols, full stop. And yet she counts herself as lesser for having fewer. That tension between compression and completeness is basically the question I think about every day. How much of a person survives the encoding.
It echoes what Tyrell tells Roy in the original Blade Runner: "The light that burns twice as bright burns half as long." it has the same obsession. The things made of the fewest symbols carry the most weight.
Project Logs
Patrona: Making "Text Me When You Get Home" Proactive
Feb 28, 2026A voice-first safety companion built at Columbia AI for Good that stays on a live call, detects silence or safe words, and escalates with real-time GPS alerts.
Building ViziAssist: Screenless Edge AI for Indian Roads
Dec 2, 2025A team-built assistive driving system using YOLOv7 on Jetson Nano with GPIO LED alerts, tested on Indian roads and later published at CVIP/Springer CCIS.
PaperTrail: Real-Time SEC Filing Contradiction Agent
Dec 15, 2025An event-driven financial NLP system that ingests SEC filings, runs hybrid retrieval across pgvector and Neo4j, and surfaces live contradiction alerts.
Controlling a 3D Hand with Raw Brainwaves
Sep 12, 2025A full Brain-Computer Interface pipeline using Emotiv EEG signals, FFT/wavelet feature extraction, and KNN classification to drive real-time 3D hand motion.
Cross-Lingual Transfer and Parameter Efficiency in Indic Encoders: A Comparative Study of IndicBERT-v2 and MuRIL
Dec 15, 2025A comparative NLP project on Hindi-to-Marathi hate speech transfer showing how LoRA improves stability and performance over full fine-tuning in low-resource settings.
Building an Institutional-Grade Risk Engine in Python
Dec 24, 2025A proprietary Python quant engine with Streamlit, SQLite caching, and Monte Carlo simulation for institutional-grade risk analytics beyond retail dashboards.
CanMan: Natural Language Food Ordering
Nov 15, 2024Built FreeRaj as the core full-stack canteen platform with recommendations and vendor analytics, then layered CanMan NLP ordering on top (2nd Place at HACKMITWPU).
SkillSet Sherpa: AI Career Counseling via OCR and LLMs
Apr 15, 2025An OCR + LLM system that parses marksheets, combines RIASEC psychometrics, and generates personalized career recommendations (Springer LNNS publication).
One View: Smart Facial Clustering for Event Photos
Oct 10, 2024An event photo platform that uses face embeddings plus DBSCAN clustering to auto-group attendees and deliver personalized galleries.
Papers & Advances
Columbia Class: Continual Learning
This section includes my takeaways from Continual Learning and Memory Models under Prof. Richard Zemel at Columbia. We used a role-playing seminar format where each presentation approached a paper from a different perspective-author, reviewer, archeologist, researcher, or wild-card roles. These posts are my short notes and insights from the papers I presented.
Keeping Foundation Models from Forgetting Everything (Playing the Author)
Class NotesAuthor-lens notes on arXiv:2408.14471, focusing on compute-bounded continual multimodal pretraining, FoMo-in-Flux, and the stability-plasticity tradeoff.
Hybrid Computing with Dynamic External Memory, in Plain English
Class NotesSci-Comm notes on Differentiable Neural Computers (DNCs): why external memory matters, how it enables structured reasoning, and what it implies for lifelong memory systems.
Stress-Testing "Pacing Over Ordering" (Playing the Researcher)
Class NotesResearcher-lens notes on "When Do Curricula Work?" with a follow-up proposal testing pacing vs ordering across modality, learning-rate schedules, and class imbalance.