I've been in the trenches building high-performance API backends and custom Discord bots for years, and one recurring challenge always crops up: managing context for AI models effectively. I recall a specific incident where a custom moderation bot, built on Node.js, started suffering from inconsistent responses after processing a few thousand messages. The issue wasn't the model itself, but how we were feeding it context — a patchwork of ad-hoc session management and prompt concatenation. It was a nightmare to debug and even harder to scale. That's when I really started looking for a more structured approach, and the Model Context Protocol (MCP) began gaining traction.
MCP, which originated as a niche project, has truly evolved into an industry-standard for orchestrating model interactions, especially when dealing with long-running conversations or complex, evolving states. It's a game-changer for anyone serious about production AI applications, moving beyond simple one-shot prompts to genuinely stateful, context-aware interactions. And as a Pythonista, naturally, I gravitated towards building out MCP servers with FastAPI due to its performance and developer-friendly async capabilities.
Understanding the Model Context Protocol (MCP)
At its core, MCP provides a standardized way for an AI model client (your application) to manage and present context to a model server. Think of it as a sophisticated API for conversational memory. Instead of stuffing every past utterance or relevant piece of data into a single, ever-growing prompt string, MCP defines structured messages, turns, and context objects. This separation allows for more efficient token management, better model performance, and significantly improved maintainability of your AI applications.
Why is this critical? Imagine building a code-generation assistant that needs to understand your entire project's structure, past commit messages, and specific file contents. Without MCP, you'd be passing massive, redundant context blobs with every request, quickly hitting token limits and incurring unnecessary costs. With MCP, you establish a session, push relevant context (like file contents or GitHub issue history), and then only send differential updates or specific queries, letting the server intelligently manage the model's working memory.
The Pillars of MCP
- Context Objects: Structured data payloads representing various pieces of information (e.g., chat history, document fragments, code snippets, user profiles).
- Turns: Individual interactions within a session, typically a user input and a model response.
- Sessions: Long-lived interactions that maintain a coherent context across multiple turns and requests.
- Model Adapters: Components that translate the generic MCP context into specific model API calls (e.g., OpenAI, Anthropic, local LLMs).
Designing Your Python MCP Server with FastAPI
When I set out to build robust backend services, FastAPI is my go-to for Python. Its performance, Pydantic integration, and native async support make it ideal for I/O-bound tasks like orchestrating AI models and external APIs. For a deeper dive into why FastAPI often beats out other frameworks in specific high-performance scenarios, I've previously written about it in my Ktor vs. FastAPI: A Backend Performance Deep Dive post. Here, we'll leverage FastAPI to create a stateless (from the HTTP request perspective) yet context-aware MCP server.
Server Architecture Overview
Our MCP server will expose several endpoints:
POST /mcp/session/create: Initializes a new MCP session, returning a unique session ID.POST /mcp/session/{session_id}/context: Adds or updates context within an existing session.POST /mcp/session/{session_id}/invoke: Invokes the underlying AI model with the current session context and a new user input.GET /mcp/session/{session_id}/status: Retrieves the current state or history of a session.
We'll need an in-memory or persistent store for session data. For simplicity, we'll start with an in-memory dictionary, but for production, you'd swap this out for Redis or MongoDB.
Implementing Core MCP Components
Let's define our Pydantic models for request and response bodies. These ensure strong typing and automatic validation, a massive time-saver.
# app/models.py
from pydantic import BaseModel, Field
from typing import Dict, Any, List, Literal, Optional
class ContextItem(BaseModel):
type: str = Field(..., description="Type of context (e.g., 'code', 'chat_message', 'github_issue')")
content: Any = Field(..., description="The actual content of the context item")
metadata: Dict[str, Any] = Field(default_factory=dict, description="Optional metadata")
class SessionCreateResponse(BaseModel):
session_id: str
class InvokeRequest(BaseModel):
user_input: str
model_config: Dict[str, Any] = Field(default_factory=dict, description="Specific model parameters")
class InvokeResponse(BaseModel):
model_response: str
session_id: str
context_updates: List[ContextItem] = Field(default_factory=list)
class GitHubIssueContext(ContextItem):
type: Literal["github_issue"] = "github_issue"
issue_id: int
title: str
body: str
state: str
labels: List[str]
url: str
class GitHubPRContext(ContextItem):
type: Literal["github_pr"] = "github_pr"
pr_id: int
title: str
body: str
state: str
url: str
files_changed: List[str]
class MCPContext(BaseModel):
items: List[ContextItem] = Field(default_factory=list)
class MCPSession(BaseModel):
session_id: str
context: MCPContext = Field(default_factory=MCPContext)
history: List[Dict[str, Any]] = Field(default_factory=list) # Stores turn history
Next, a simple session manager. In production, this would be backed by a proper database. If you're building a Discord bot that needs to maintain state across interactions, a similar session management pattern is crucial, as I discussed in Building a Discord Ticket Bot with Python.
# app/session_manager.py
import uuid
from typing import Dict, Optional
from app.models import MCPSession, ContextItem
class SessionManager:
def __init__(self):
self.sessions: Dict[str, MCPSession] = {}
def create_session(self) -> MCPSession:
session_id = str(uuid.uuid4())
session = MCPSession(session_id=session_id)
self.sessions[session_id] = session
return session
def get_session(self, session_id: str) -> Optional[MCPSession]:
return self.sessions.get(session_id)
def update_session_context(self, session_id: str, context_items: List[ContextItem]):
session = self.get_session(session_id)
if session:
for item in context_items:
# Simple update logic: append or replace based on a unique identifier
# For more complex scenarios, you'd check item.metadata for IDs
session.context.items.append(item)
# In a real system, you'd de-duplicate or intelligently merge context
return True
return False
def add_history_entry(self, session_id: str, entry: Dict[str, Any]):
session = self.get_session(session_id)
if session:
session.history.append(entry)
return True
return False
session_manager = SessionManager()
Integrating the GitHub API for Context
This is where the MCP server truly shines. Instead of dumping raw GitHub API responses into a model's prompt, we'll fetch relevant data and structure it as MCP ContextItems. This allows the model adapter to intelligently select and format the most pertinent information based on the current query.
We'll use httpx for async HTTP requests to the GitHub API. You'll need a GitHub Personal Access Token (PAT) with appropriate read permissions for the repositories you want to access. Never hardcode API keys! Use environment variables.
# app/github_integration.py
import httpx
import os
from typing import List, Dict, Any
from app.models import GitHubIssueContext, GitHubPRContext
GITHUB_TOKEN = os.getenv("GITHUB_TOKEN")
GITHUB_API_BASE_URL = "https://api.github.com"
async def fetch_github_issues(repo_owner: str, repo_name: str, state: str = "open") -> List[GitHubIssueContext]:
headers = {"Authorization": f"token {GITHUB_TOKEN}", "Accept": "application/vnd.github.v3+json"}
url = f"{GITHUB_API_BASE_URL}/repos/{repo_owner}/{repo_name}/issues?state={state}"
async with httpx.AsyncClient() as client:
response = await client.get(url, headers=headers)
response.raise_for_status()
issues_data = response.json()
issues = []
for issue in issues_data:
if "pull_request" not in issue: # Filter out PRs, which GitHub treats as issues
issues.append(GitHubIssueContext(
issue_id=issue["number"],
title=issue["title"],
body=issue["body"] or "No description provided.",
state=issue["state"],
labels=[label["name"] for label in issue["labels"]],
url=issue["html_url"]
))
return issues
async def fetch_github_prs(repo_owner: str, repo_name: str, state: str = "open") -> List[GitHubPRContext]:
headers = {"Authorization": f"token {GITHUB_TOKEN}", "Accept": "application/vnd.github.v3+json"}
url = f"{GITHUB_API_BASE_URL}/repos/{repo_owner}/{repo_name}/pulls?state={state}"
async with httpx.AsyncClient() as client:
response = await client.get(url, headers=headers)
response.raise_for_status()
prs_data = response.json()
prs = []
for pr in prs_data:
# For simplicity, we'll just get files_changed count here, full details would need another API call
# For example: await client.get(pr["url"] + "/files", headers=headers)
prs.append(GitHubPRContext(
pr_id=pr["number"],
title=pr["title"],
body=pr["body"] or "No description provided.",
state=pr["state"],
url=pr["html_url"],
files_changed=[] # Placeholder, could be populated with a follow-up API call
))
return prs
# External Link: Refer to the official GitHub API documentation for more details:
# GitHub REST API Documentation
Model Adapter (Simulated)
For this guide, we'll use a simulated model. In a real-world scenario, this would be where you interface with OpenAI's API, Anthropic's API, or a local LLM via libraries like llama-cpp-python. The key is that the model adapter takes the MCP context and formats it into the specific prompt structure required by the LLM.
# app/model_adapter.py
from typing import List, Dict, Any
from app.models import ContextItem
class SimulatedModelAdapter:
def __init__(self, model_name: str = "Simulated-GPT-4"): # Or OpenAI/Anthropic
self.model_name = model_name
async def invoke(self, context: List[ContextItem], user_input: str, model_config: Dict[str, Any]) -> str:
# In a real scenario, this would involve calling an external LLM API
# and constructing a prompt based on the context.
# Simulate context processing:
context_summary = ""
for item in context:
if item.type == "github_issue":
context_summary += f"Issue {item.issue_id} ({item.state}): {item.title}. Body: {item.body[:100]}...\n"
elif item.type == "github_pr":
context_summary += f"PR {item.pr_id} ({item.state}): {item.title}. Body: {item.body[:100]}...\n"
# Add other context types here
if not context_summary:
context_summary = "No specific context provided."
response_prefix = f"({self.model_name}) Acknowledged context:\n{context_summary}\n"
# Simulate model response based on user input
if "summarize" in user_input.lower() and "github" in context_summary.lower():
return response_prefix + "Here's a summary of the GitHub items: Multiple open issues and PRs related to project development. Focus is on addressing bug fixes and new features."
elif "hello" in user_input.lower():
return response_prefix + "Hello there! How can I assist you with your project context today?"
else:
return response_prefix + f"Received your input: '{user_input}'. Please provide more specific instructions or context for a detailed response."
simulated_model_adapter = SimulatedModelAdapter()
FastAPI Application
Now, let's wire everything up in our main FastAPI application.
# app/main.py
from fastapi import FastAPI, HTTPException
from app.session_manager import session_manager
from app.models import SessionCreateResponse, InvokeRequest, InvokeResponse, ContextItem, GitHubIssueContext, GitHubPRContext
from app.github_integration import fetch_github_issues, fetch_github_prs
from app.model_adapter import simulated_model_adapter
from typing import List
app = FastAPI(title="Python MCP GitHub Context Server")
@app.post("/mcp/session/create", response_model=SessionCreateResponse, summary="Create a new MCP session")
async def create_mcp_session():
session = session_manager.create_session()
return SessionCreateResponse(session_id=session.session_id)
@app.post("/mcp/session/{session_id}/context", status_code=204, summary="Add or update context for an MCP session")
async def add_context_to_session(session_id: str, context_items: List[ContextItem]):
if not session_manager.get_session(session_id):
raise HTTPException(status_code=404, detail="Session not found")
session_manager.update_session_context(session_id, context_items)
# In a real system, you might trigger a model re-indexing of context here
return
@app.post("/mcp/session/{session_id}/github_context", status_code=204, summary="Fetch and add GitHub context")
async def add_github_context_to_session(session_id: str, repo_owner: str, repo_name: str, fetch_issues: bool = True, fetch_prs: bool = True):
if not session_manager.get_session(session_id):
raise HTTPException(status_code=404, detail="Session not found")
context_items = []
if fetch_issues:
issues = await fetch_github_issues(repo_owner, repo_name)
context_items.extend(issues)
if fetch_prs:
prs = await fetch_github_prs(repo_owner, repo_name)
context_items.extend(prs)
if context_items:
session_manager.update_session_context(session_id, context_items)
return
@app.post("/mcp/session/{session_id}/invoke", response_model=InvokeResponse, summary="Invoke AI model with session context")
async def invoke_model(session_id: str, request: InvokeRequest):
session = session_manager.get_session(session_id)
if not session:
raise HTTPException(status_code=404, detail="Session not found")
model_response = await simulated_model_adapter.invoke(
context=session.context.items,
user_input=request.user_input,
model_config=request.model_config
)
# Store interaction history
session_manager.add_history_entry(
session_id,
{"user_input": request.user_input, "model_response": model_response}
)
return InvokeResponse(model_response=model_response, session_id=session_id)
@app.get("/mcp/session/{session_id}/status", summary="Get MCP session status and history")
async def get_session_status(session_id: str):
session = session_manager.get_session(session_id)
if not session:
raise HTTPException(status_code=404, detail="Session not found")
return session
# External Link: For more on FastAPI, check its official documentation:
# FastAPI Documentation
Scalability, Performance, and Deployment
Running an MCP server, especially one that interacts with external APIs and potentially large language models, demands careful consideration for scalability and performance. FastAPI, with its asynchronous nature, is a great start. Here are a few points I've learned from deploying similar systems:
- Asynchronous I/O: FastAPI and
httpxare built onasyncio, which allows your server to handle many concurrent requests without blocking, crucial when waiting for GitHub or LLM API responses. - Worker Processes: For CPU-bound tasks (like complex context summarization if you implement it), run multiple Uvicorn worker processes. Gunicorn is excellent for managing these workers.
- Caching: Cache frequently accessed GitHub data or LLM responses (if appropriate) using Redis. This significantly reduces latency and API call costs.
- Database for Sessions: Replace the in-memory
SessionManagerwith a persistent solution like MongoDB (for flexible schema) or Redis (for high-speed key-value session storage). - Containerization: Dockerize your application. This simplifies deployment and ensures consistency across environments.
- Orchestration: For high availability and automatic scaling, deploy on Kubernetes or use managed container services.
For deployment, especially when you need dedicated resources, I highly recommend checking out cloud VPS providers like Vultr. Their high-performance compute instances offer fantastic value for money, and I've personally used them for various FastAPI and Ktor deployments, experiencing reliable performance and easy scaling. It's a solid choice for hosting your MCP server with optimal control.
When it comes to the technical depth required to truly optimize your backend and understand how underlying systems work, having a strong grasp of operating system internals is invaluable. It’s the same kind of deep dive I cover in my post Demystifying Android OS Internals, though focused on a different domain, the principles of system efficiency apply universally.
Example: Scaling with Uvicorn and Gunicorn
# To run with Uvicorn (for development)
uvicorn app.main:app --reload --port 8000
# To run with Gunicorn and Uvicorn workers (for production)
gunicorn app.main:app --workers 4 --worker-class uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000
Security Best Practices
Any service interacting with sensitive data or external APIs needs robust security:
- Environment Variables: As mentioned, store API keys (GitHub PAT, LLM keys) in environment variables, never in code.
- Input Validation: FastAPI's Pydantic models handle much of this, but always be wary of malicious inputs, especially for free-text fields.
- Rate Limiting: Implement rate limiting on your API endpoints to prevent abuse and protect external API quotas. Fastapi-Limiter is a good library for this.
- HTTPS: Always deploy your server behind HTTPS using a reverse proxy like Nginx or Caddy.
- Least Privilege: Ensure your GitHub PAT only has the minimum necessary permissions.
Comparative Analysis: Traditional Context vs. MCP
Let's look at how MCP streamlines context management compared to traditional approaches, particularly when integrating external APIs like GitHub.
| Feature | Traditional Ad-hoc Context | Model Context Protocol (MCP) |
|---|---|---|
| Context Structure | Flat strings, unparsed blobs, manual concatenation. | Structured JSON objects, typed context items (e.g., GitHubIssueContext, ChatMessage). |
| Token Efficiency | Often redundant, re-sending full history, prone to hitting limits. | Intelligent context selection, differential updates, reduced token usage. |
| Developer Experience | Fragile, hard to debug, boilerplate context manipulation. | Standardized API, Pydantic validation, clear separation of concerns. |
| Scalability | Difficult to scale due to large prompt sizes and parsing overhead. | Designed for scale, optimized for context retrieval and update. |
| Model Adaptability | Requires custom prompt engineering for each model. | Abstracts model-specific prompt details via adapters. |
| GitHub Integration | Raw API responses directly into prompts, manual filtering. | Structured GitHub context items, easier for model to 'understand' specific fields. |
Need Help with Custom APIs or Backend Systems?
I build robust, secure, and scalable backend services, databases, and microservices using FastAPI, Ktor, Node.js, and MongoDB. Let's build your server infrastructure!
Written by
Hazrat Ummar Shaikh
Android Developer with 4+ years of experience. Built production Android apps, Ktor backends, Discord bots, and SaaS products using Kotlin, Python, and MongoDB. Passionate about building robust systems and writing clean code.
Related Posts

Unlock advanced AI integration with Model Context Protocol. I'll show you how to build a robust Python MCP server from scratch, leveraging the GitHub API for real-world context.

The Model Context Protocol is now standard for AI. I'll guide you through building a high-performance Python MCP server for GitHub API automation.

A weekend Python script I engineered saved a CA firm 209 hours during ITR season. I'll break down the FastAPI, MongoDB, and automation strategies that unlocked this massive efficiency gain.
