LLM Resume Agent

Link to GitHub Repo
Project Motivation and Overview
I built this website to create an interactive way to engage with my resume. After conversations with friends I had the idea to use a LLM (Large Language Model) to  filter and retrieve relevant information answering all questions in one place.

The current version of the chatbot runs the Meta-Llama-3-8B-Instruct model hosted with Modal. A vector-based document retrieval system is used to retrieve relevent context. Responses are guided with prompt engineering with final answers returned to the user (You!)
Key Skills
This project leverages Modal for deployment and NVIDIA TensorRT-LLM for accelerated inference with the Meta-Llama-3-8B-Instruct model. I engineered the core Python-based pipeline, which included quantizing the Llama 3 model to FP8, building an optimized TensorRT-LLM engine, and creating a FAISS vector index from local document embeddings generated by Sentence Transformers (all-MiniLM-L6-v2). Additionally, I integrated the chatbot with my Webflow website, ensuring seamless user interaction while maintaining structured and precise resume data retrieval.
Explore my work, projects, and educational experience with this chatbot, powered by a locally hosted LLM. You can select a pre-written question or type your own to get relevant, context-aware answers.