Notes from the Build Process

Hi I'm Ketaki.

"Mere data makes a man." Probably the most haunting line in the whole movie and also accidentally the most accurate description of what I study. CS grad student here and this is where I write about the things I build and what they end up teaching me.

The line comes from this exchange in Blade Runner 2049:

Joi: Mere data makes a man. A and C and T and G. The alphabet of you. All from four symbols. I am only two: 1 and 0.

'K': Half as much but twice as elegant, sweetheart.

An AI telling a human he's just data. In 2049, that read as poetry. In 2025, it reads as a product roadmap. What I find interesting isn't the reductionism, it's the confidence. Joi doesn't say "you could be described as data." She says you are it. Four symbols, full stop. And yet she counts herself as lesser for having fewer. That tension between compression and completeness is basically the question I think about every day. How much of a person survives the encoding.

It echoes what Tyrell tells Roy in the original Blade Runner: "The light that burns twice as bright burns half as long." it has the same obsession. The things made of the fewest symbols carry the most weight.

Project Logs

Patrona: Making "Text Me When You Get Home" Proactive

Feb 28, 2026

A voice-first safety companion built at Columbia AI for Good that stays on a live call, detects silence or safe words, and escalates with real-time GPS alerts.

Voice AI Hackathon Winner

Building ViziAssist: Screenless Edge AI for Indian Roads

Dec 2, 2025

A team-built assistive driving system using YOLOv7 on Jetson Nano with GPIO LED alerts, tested on Indian roads and later published at CVIP/Springer CCIS.

Computer Vision Edge AI

PaperTrail: Real-Time SEC Filing Contradiction Agent

Dec 15, 2025

An event-driven financial NLP system that ingests SEC filings, runs hybrid retrieval across pgvector and Neo4j, and surfaces live contradiction alerts.

Agentic AI Financial NLP

Controlling a 3D Hand with Raw Brainwaves

Sep 12, 2025

A full Brain-Computer Interface pipeline using Emotiv EEG signals, FFT/wavelet feature extraction, and KNN classification to drive real-time 3D hand motion.

Signal Processing BCI

Cross-Lingual Transfer and Parameter Efficiency in Indic Encoders: A Comparative Study of IndicBERT-v2 and MuRIL

Dec 15, 2025

A comparative NLP project on Hindi-to-Marathi hate speech transfer showing how LoRA improves stability and performance over full fine-tuning in low-resource settings.

NLP Research LoRA / PEFT

Building an Institutional-Grade Risk Engine in Python

Dec 24, 2025

A proprietary Python quant engine with Streamlit, SQLite caching, and Monte Carlo simulation for institutional-grade risk analytics beyond retail dashboards.

Python Engine Quantitative Finance

CanMan: Natural Language Food Ordering

Nov 15, 2024

Built FreeRaj as the core full-stack canteen platform with recommendations and vendor analytics, then layered CanMan NLP ordering on top (2nd Place at HACKMITWPU).

NLP Chatbot Recommendation Engine

SkillSet Sherpa: AI Career Counseling via OCR and LLMs

Apr 15, 2025

An OCR + LLM system that parses marksheets, combines RIASEC psychometrics, and generates personalized career recommendations (Springer LNNS publication).

LLM Orchestration OCR Pipeline

One View: Smart Facial Clustering for Event Photos

Oct 10, 2024

An event photo platform that uses face embeddings plus DBSCAN clustering to auto-group attendees and deliver personalized galleries.

Computer Vision DBSCAN Clustering

Papers & Advances

Columbia Class: Continual Learning

This section includes my takeaways from Continual Learning and Memory Models under Prof. Richard Zemel at Columbia. We used a role-playing seminar format where each presentation approached a paper from a different perspective-author, reviewer, archeologist, researcher, or wild-card roles. These posts are my short notes and insights from the papers I presented.

Keeping Foundation Models from Forgetting Everything (Playing the Author)

Class Notes

Author-lens notes on arXiv:2408.14471, focusing on compute-bounded continual multimodal pretraining, FoMo-in-Flux, and the stability-plasticity tradeoff.

Continual Learning Author Lens

Hybrid Computing with Dynamic External Memory, in Plain English

Class Notes

Sci-Comm notes on Differentiable Neural Computers (DNCs): why external memory matters, how it enables structured reasoning, and what it implies for lifelong memory systems.

Science Communicator Lens Memory Models

Stress-Testing "Pacing Over Ordering" (Playing the Researcher)

Class Notes

Researcher-lens notes on "When Do Curricula Work?" with a follow-up proposal testing pacing vs ordering across modality, learning-rate schedules, and class imbalance.

Researcher Lens Curriculum Learning