Eight months ago, I hit a wall with my computer science curriculum. A critical CS exam required me to painstakingly write out pseudocode for algorithms I could already implement in Python, Kotlin, or Swift blindfolded. It wasn't about understanding the logic; it was about the tedious, repetitive formatting and the sheer volume of material to 're-learn' in a specific, artificial way. This wasn't effective studying; it was rote memorization, and frankly, it felt like an archaic gatekeeping mechanism.
Instead of grinding through flashcards and endless practice problems that felt beneath my actual coding abilities, I decided to build a system that would learn the exam patterns and generate practice questions, pseudocode, and even evaluate my answers. My goal wasn't to cheat, but to outsource the rote aspects of studying to an AI, freeing me to focus on deeper conceptual understanding and practical application. This is the story of how I spent eight months building an AI Exam App, leveraging FastAPI, custom LLM pipelines, and MongoDB to turn my frustration into a functional, intelligent study assistant.
The Problem: Inefficient Studying & Knowledge Gaps
Traditional studying methods, especially for technical subjects, often suffer from several shortcomings:
- Repetitive Drills: Manual creation and answering of practice questions is time-consuming.
- Subjectivity in Evaluation: Self-grading complex answers, especially pseudocode, is challenging.
- Passive Learning: Reading notes repeatedly often lacks active recall.
- Lack of Targeted Practice: Identifying specific weaknesses and generating questions around them can be difficult.
My vision was an application that could ingest course material (PDFs, lecture notes, textbooks), process it with an LLM, generate relevant questions in various formats (multiple-choice, short answer, pseudocode problems), and then evaluate user responses. This required a robust backend, an intelligent AI core, and an efficient data store.
Core Architecture: A Pythonic, Asynchronous Powerhouse
I structured the application around a microservices-inspired architecture, primarily powered by Python. Here’s the high-level breakdown:
- Frontend: A simple React.js frontend (for MVP) handles user interaction, presenting questions and receiving answers.
- Backend API: FastAPI became the undisputed choice for its asynchronous capabilities, Pydantic data validation, and excellent developer experience. This serves as the brain, orchestrating all operations.
- AI Core: A combination of OpenAI's GPT models for general reasoning and fine-tuned open-source models (like Llama 3 for specific tasks) via Hugging Face Transformers.
- Database: MongoDB was selected for its flexibility with unstructured and semi-structured data, perfect for storing diverse question types, user progress, and LLM outputs.
- Queueing: Celery with Redis as a broker for handling long-running LLM inference tasks asynchronously, preventing API timeouts.
The entire system needed to be fast, responsive, and scalable, especially given the potentially high latency of LLM calls. FastAPI's native `async`/`await` support with `uvicorn` was critical here.
# app/main.py
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import List
import asyncio
from .database import get_db_client, fetch_material, save_question_bank, get_question_bank_by_id
from .llm_service import generate_questions_from_text, evaluate_answer
from .celery_worker import generate_questions_task, evaluate_answer_task
app = FastAPI(
title="AI Exam App Backend",
description="API for generating and evaluating exam questions using LLMs.",
version="0.1.0"
)
class MaterialInput(BaseModel:
title: str
content: str
category: str
class QuestionGenerateResponse(BaseModel:
task_id: str
message: str
class AnswerInput(BaseModel:
question_id: str
user_answer: str
context_material_id: str
class EvaluationResponse(BaseModel:
task_id: str
message: str
@app.post("/material", response_model=QuestionGenerateResponse, status_code=202)
async def upload_material_and_generate_questions(material: MaterialInput):
# In a real app, save material to DB first, then pass ID
# For this example, we'll pass content directly or a dummy ID
task = generate_questions_task.delay(material.dict())
return QuestionGenerateResponse(task_id=task.id, message="Question generation initiated.")
@app.get("/question_bank/{bank_id}")
async def get_question_bank(bank_id: str):
question_bank = await get_question_bank_by_id(bank_id)
if not question_bank:
raise HTTPException(status_code=404, detail="Question bank not found")
return question_bank
@app.post("/answer/evaluate", response_model=EvaluationResponse, status_code=202)
async def submit_answer_for_evaluation(answer: AnswerInput):
task = evaluate_answer_task.delay(answer.dict())
return EvaluationResponse(task_id=task.id, message="Answer evaluation initiated.")
# More endpoints for task status, user management, etc.This modular approach allowed me to develop components in parallel and swap out LLM providers or database strategies as needed without rewriting the entire application. When deploying this kind of high-performance backend, I often refer back to guides like Minimal FastAPI Deployment on DigitalOcean to ensure robust and efficient hosting from day one.
Intelligent Data Handling with MongoDB
For an application dealing with diverse and evolving data (course materials, various question types, user responses, LLM-generated feedback), a flexible schema was crucial. MongoDB, a NoSQL document database, was a natural fit.
Here's how I structured the collections:
materials: Stores the original course content (e.g., PDF text, lecture notes). Documents include_id,title,content_text,category,upload_date.question_banks: Stores generated sets of questions, linked to a specific material. Each document contains_id,material_id,generation_params(e.g., difficulty, question types requested), and an array ofquestions.questions: While questions are nested inquestion_banks, I considered a separate collection for fine-grained indexing and retrieval if questions needed to be queried independently across multiple banks. Each question document would have_id,question_text,question_type,options(for MCQs),correct_answer,generated_feedback_template.user_responses: Records user attempts. Documents include_id,user_id,question_id,user_answer,submission_date,llm_evaluation(e.g., score, detailed feedback).
Using Motor, the asynchronous driver for MongoDB, alongside FastAPI's async capabilities, ensured that database operations didn't block the event loop, maintaining high performance even under load.
# app/database.py
from motor.motor_asyncio import AsyncIOMotorClient
from bson import ObjectId
MONGO_DETAILS = "mongodb://localhost:27017"
client = AsyncIOMotorClient(MONGO_DETAILS)
database = client.exam_app_db
async def get_db_client():
return database
async def fetch_material(material_id: str):
db = await get_db_client()
material = await db.materials.find_one({"_id": ObjectId(material_id)})
return material
async def save_question_bank(question_bank_data: dict):
db = await get_db_client()
result = await db.question_banks.insert_one(question_bank_data)
return str(result.inserted_id)
async def get_question_bank_by_id(bank_id: str):
db = await get_db_client()
bank = await db.question_banks.find_one({"_id": ObjectId(bank_id)})
# Convert ObjectId to str for JSON serialization
if bank:
bank["_id"] = str(bank["_id"])
if 'material_id' in bank:
bank['material_id'] = str(bank['material_id'])
for q in bank.get('questions', []):
if '_id' in q:
q['_id'] = str(q['_id'])
return bankThis approach provided the agility needed for rapid prototyping and iteration. MongoDB's flexibility is a significant advantage when dealing with evolving data models, especially in AI applications where outputs can vary. This mirrors the flexible data needs I've encountered in other projects, like building smart job agents, where diverse data points for resumes and job descriptions demand a schema-less approach, as discussed in Beyond Keywords: Building Smart Job Agents with FastAPI & MongoDB.
The AI Core: Prompt Engineering & RAG Pipeline
This is where the magic happens. The AI's ability to generate accurate questions and evaluate answers critically depends on effective prompt engineering and a robust Retrieval Augmented Generation (RAG) pipeline.
Prompt Engineering for Question Generation
The core challenge was to make the LLM generate diverse, relevant, and appropriately difficult questions. My prompts evolved significantly over time. Initial attempts were simple, like
Need a Professional Mobile & Backend Developer?
I build premium native mobile apps (Android, iOS) and high-performance backend systems (FastAPI, Ktor). Let's collaborate on your next project!
Written by
Hazrat Ummar Shaikh
Android Developer with 4+ years of experience. Built production Android apps, Ktor backends, Discord bots, and SaaS products using Kotlin, Python, and MongoDB. Passionate about building robust systems and writing clean code.

