ankita priya

🩺diabetes prediction

what it is. a machine-learning pipeline that predicts diabetes risk from a handful of clinical features — glucose, bmi, insulin, age — with careful handling of missing values and aggressive feature selection. this project later became my ijmit paper (see papers).

why i built it

clinical datasets look clean in the textbook and are a landmine in real life. i wanted to know, at a gut level, what happens when you treat the unglamorous parts — missing values, leakage, class imbalance — as the main event instead of a side quest.

what i learned

missing values are a modelling choice. imputation is a lie you tell the model. which lie you tell changes the result more than which model you pick.
feature selection beats model zoo. i tried eleven models. picking the right three features moved f1 more than picking the "best" model did.
a paper is a forcing function. writing the work up honestly, for peer review, is where i caught my own shortcuts.