Project Motivation and Overview
I built this website to create an interactive way to engage with my resume. After conversations with friends I had the idea to use a LLM (Large Language Model) to filter and retrieve relevant information answering all questions in one place.
The current version of the chatbot runs the Meta-Llama-3-8B-Instruct model hosted with Modal. A vector-based document retrieval system is used to retrieve relevent context. Responses are guided with prompt engineering with final answers returned to the user (You!)
Key Skills
This project leverages Modal for deployment and NVIDIA TensorRT-LLM for accelerated inference with the Meta-Llama-3-8B-Instruct model. I engineered the core Python-based pipeline, which included quantizing the Llama 3 model to FP8, building an optimized TensorRT-LLM engine, and creating a FAISS vector index from local document embeddings generated by Sentence Transformers (all-MiniLM-L6-v2). Additionally, I integrated the chatbot with my Webflow website, ensuring seamless user interaction while maintaining structured and precise resume data retrieval.