A production-grade, agentic RAG platform.
In a world buzzing with AI demonstrations, moving from a simple chatbot prototype to a robust, production-grade system remains a monumental leap. Many projects showcase impressive capabilities in isolation but fall short when faced with the realities of scalability, reliability, and real-world data. They are demos, not durable systems. Tesla Mech Agent was built to bridge that gap.
Finding clear, step-by-step repair information for Tesla vehicles can be a significant challenge. Technicians and enthusiasts often have to sift through dense, lengthy official documentation, trying to piece together the correct procedure. This process is time-consuming, inefficient, and can lead to costly mistakes.
The core issues are:
Tesla Mech Agent was built to solve this exact problem. It provides a simple, chat-based UI that allows users to ask for repair instructions in natural language and receive clear, step-by-step guidance, complete with diagrams and warnings, directly from the source material. It transforms a frustrating search process into a helpful conversation.
This project is a complete, full-stack AI platform that provides a reference architecture for building production-ready Retrieval-Augmented Generation (RAG) systems and intelligent AI agents. It intentionally moves beyond the “toy chatbot” paradigm to tackle the critical engineering challenges of a real-world system: architecture, scalability, observability, and correctness.
The repository serves as:
The entire platform is designed as a cohesive, end-to-end system that translates a user’s natural language question into a precise, media-rich answer. The architecture is best understood by following the journey of a user’s query from the interface to the final response.
Figure: An overview of the Tesla Mech Agent’s modular and layered architecture, showcasing the interaction between different components from user interface to data storage.
Here’s how it works:
Query & Embedding: The process begins when a user submits a query through the Chat UI. The query is sent to the central Agent API, which uses an embedding model hosted on Ollama to convert the text into a vector representation.
Relevance Analysis & Retrieval: Before proceeding, the agent’s core logic determines if the query is relevant, as detailed in the “Brains” section below. If the query is valid, the agent uses the vector to search for the most relevant document chunks in the Qdrant vector database.
Document & Image Fetching: Once the correct document is identified from the search results, the agent retrieves the full markdown content and any associated images directly from MinIO, our S3-compatible object storage.
Generation & Final Response: With the retrieved context in hand, the agent uses the generation model (also on Ollama) to synthesize a final, grounded answer. This response, complete with formatted text and images from MinIO, is then sent back to the user in the Chat UI.
Throughout this process, the entire flow is monitored by Pheonix for tracing, evaluation, and observability, ensuring the system remains reliable and performant. This modular, service-oriented design allows each component to be scaled and maintained independently, creating a truly production-grade RAG platform.
This project wasn’t just built to work; it was built to last. The entire system is founded on a set of core principles that reflect modern, production-grade software engineering.
The heart of the platform is a deterministic, decision-driven agent implemented with LangGraph. This graph-based approach ensures that the agent’s behavior is predictable, traceable, and easy to debug—critical features for any production system.

The agent follows a clear, multi-step process to handle user requests, from initial query analysis to final, grounded answer generation.
generate_query: An LLM analyzes the user’s input, validates if it’s a relevant query, and rewrites it for optimal retrieval.search_documents: Performs efficient semantic retrieval from a Qdrant vector store to find the most relevant information.Grading Decision: A conditional LLM “grader” checks if the retrieved documents are sufficient to answer the query. This is a critical quality gate.generate_answer: If the documents pass the grade, this node generates a final answer, strictly grounded in the retrieved context.answer_failure / irrelevant_query: If a query is invalid or the information is insufficient, the agent routes to a safe exit path, informing the user clearly without fabricating information.A production system requires a commitment to quality and continuous improvement.
The project includes a full evaluation framework to objectively measure the performance of the retrieval stack, with metrics like Recall@K, Precision@K, and NDCG@K. This allows for data-driven comparisons between different embedding models, retrieval strategies, and RAG pipelines.
The stack includes a complete workflow for contrastive fine-tuning of embedding models, mirroring real industrial practices. This covers everything from hard negative mining and custom dataset abstractions to publishing models on the Hugging Face Hub.
Core Technologies
Tesla Mech Agent stands as a robust demonstration of a production-ready RAG platform. Key takeaways include:
This project is built for educational and portfolio purposes. It demonstrates engineering patterns and system design, not proprietary Tesla data or workflows.
Built with care by Jalal Khaldi & KARIM ZAGHLAOUI ML Engineers · AI Systems · Retrieval & Agents