I've been in the trenches building high-performance API backends and custom Discord bots for years, and one recurring challenge always crops up: managing context for AI models effectively. I recall a specific incident where a custom moderation bot, built on Node.js, started suffering from inconsistent responses after processing a few thousand messages. The issue wasn't the model itself, but how we were feeding it context — a patchwork of ad-hoc session management and prompt concatenation. It was a nightmare to debug and even harder to scale. That's when I really started looking for a more structured approach, and the Model Context Protocol (MCP) began gaining traction.

MCP, which originated as a niche project, has truly evolved into an industry-standard for orchestrating model interactions, especially when dealing with long-running conversations or complex, evolving states. It's a game-changer for anyone serious about production AI applications, moving beyond simple one-shot prompts to genuinely stateful, context-aware interactions. And as a Pythonista, naturally, I gravitated towards building out MCP servers with FastAPI due to its performance and developer-friendly async capabilities.

Understanding the Model Context Protocol (MCP)

At its core, MCP provides a standardized way for an AI model client (your application) to manage and present context to a model server. Think of it as a sophisticated API for conversational memory. Instead of stuffing every past utterance or relevant piece of data into a single, ever-growing prompt string, MCP defines structured messages, turns, and context objects. This separation allows for more efficient token management, better model performance, and significantly improved maintainability of your AI applications.

Why is this critical? Imagine building a code-generation assistant that needs to understand your entire project's structure, past commit messages, and specific file contents. Without MCP, you'd be passing massive, redundant context blobs with every request, quickly hitting token limits and incurring unnecessary costs. With MCP, you establish a session, push relevant context (like file contents or GitHub issue history), and then only send differential updates or specific queries, letting the server intelligently manage the model's working memory.

The Pillars of MCP

Context Objects: Structured data payloads representing various pieces of information (e.g., chat history, document fragments, code snippets, user profiles).
Turns: Individual interactions within a session, typically a user input and a model response.
Sessions: Long-lived interactions that maintain a coherent context across multiple turns and requests.
Model Adapters: Components that translate the generic MCP context into specific model API calls (e.g., OpenAI, Anthropic, local LLMs).

Designing Your Python MCP Server with FastAPI

When I set out to build robust backend services, FastAPI is my go-to for Python. Its performance, Pydantic integration, and native async support make it ideal for I/O-bound tasks like orchestrating AI models and external APIs. For a deeper dive into why FastAPI often beats out other frameworks in specific high-performance scenarios, I've previously written about it in my Ktor vs. FastAPI: A Backend Performance Deep Dive post. Here, we'll leverage FastAPI to create a stateless (from the HTTP request perspective) yet context-aware MCP server.

Server Architecture Overview

Our MCP server will expose several endpoints:

POST /mcp/session/create: Initializes a new MCP session, returning a unique session ID.
POST /mcp/session/{session_id}/context: Adds or updates context within an existing session.
POST /mcp/session/{session_id}/invoke: Invokes the underlying AI model with the current session context and a new user input.
GET /mcp/session/{session_id}/status: Retrieves the current state or history of a session.

We'll need an in-memory or persistent store for session data. For simplicity, we'll start with an in-memory dictionary, but for production, you'd swap this out for Redis or MongoDB.

Implementing Core MCP Components

Let's define our Pydantic models for request and response bodies. These ensure strong typing and automatic validation, a massive time-saver.

# app/models.py
from pydantic import BaseModel, Field
from typing import Dict, Any, List, Literal, Optional

class ContextItem(BaseModel):
    type: str = Field(..., description="Type of context (e.g., 'code', 'chat_message', 'github_issue')")
    content: Any = Field(..., description="The actual content of the context item")
    metadata: Dict[str, Any] = Field(default_factory=dict, description="Optional metadata")

class SessionCreateResponse(BaseModel):
    session_id: str

class InvokeRequest(BaseModel):
    user_input: str
    model_config: Dict[str, Any] = Field(default_factory=dict, description="Specific model parameters")

class InvokeResponse(BaseModel):
    model_response: str
    session_id: str
    context_updates: List[ContextItem] = Field(default_factory=list)

class GitHubIssueContext(ContextItem):
    type: Literal["github_issue"] = "github_issue"
    issue_id: int
    title: str
    body: str
    state: str
    labels: List[str]
    url: str

class GitHubPRContext(ContextItem):
    type: Literal["github_pr"] = "github_pr"
    pr_id: int
    title: str
    body: str
    state: str
    url: str
    files_changed: List[str]

class MCPContext(BaseModel):
    items: List[ContextItem] = Field(default_factory=list)

class MCPSession(BaseModel):
    session_id: str
    context: MCPContext = Field(default_factory=MCPContext)
    history: List[Dict[str, Any]] = Field(default_factory=list) # Stores turn history

Next, a simple session manager. In production, this would be backed by a proper database. If you're building a Discord bot that needs to maintain state across interactions, a similar session management pattern is crucial, as I discussed in Building a Discord Ticket Bot with Python.

# app/session_manager.py
import uuid
from typing import Dict, Optional
from app.models import MCPSession, ContextItem

class SessionManager:
    def __init__(self):
        self.sessions: Dict[str, MCPSession] = {}

    def create_session(self) -> MCPSession:
        session_id = str(uuid.uuid4())
        session = MCPSession(session_id=session_id)
        self.sessions[session_id] = session
        return session

    def get_session(self, session_id: str) -> Optional[MCPSession]:
        return self.sessions.get(session_id)

    def update_session_context(self, session_id: str, context_items: List[ContextItem]):
        session = self.get_session(session_id)
        if session:
            for item in context_items:
                # Simple update logic: append or replace based on a unique identifier
                # For more complex scenarios, you'd check item.metadata for IDs
                session.context.items.append(item) 
            # In a real system, you'd de-duplicate or intelligently merge context
            return True
        return False

    def add_history_entry(self, session_id: str, entry: Dict[str, Any]):
        session = self.get_session(session_id)
        if session:
            session.history.append(entry)
            return True
        return False


session_manager = SessionManager()

Cyberpunk workspace aesthetic of a developer coding on a holographic terminal, displaying Python code and FastAPI routes

Integrating the GitHub API for Context

This is where the MCP server truly shines. Instead of dumping raw GitHub API responses into a model's prompt, we'll fetch relevant data and structure it as MCP ContextItems. This allows the model adapter to intelligently select and format the most pertinent information based on the current query.

We'll use httpx for async HTTP requests to the GitHub API. You'll need a GitHub Personal Access Token (PAT) with appropriate read permissions for the repositories you want to access. Never hardcode API keys! Use environment variables.

# app/github_integration.py
import httpx
import os
from typing import List, Dict, Any
from app.models import GitHubIssueContext, GitHubPRContext

GITHUB_TOKEN = os.getenv("GITHUB_TOKEN")
GITHUB_API_BASE_URL = "https://api.github.com"

async def fetch_github_issues(repo_owner: str, repo_name: str, state: str = "open") -> List[GitHubIssueContext]:
    headers = {"Authorization": f"token {GITHUB_TOKEN}", "Accept": "application/vnd.github.v3+json"}
    url = f"{GITHUB_API_BASE_URL}/repos/{repo_owner}/{repo_name}/issues?state={state}"
    async with httpx.AsyncClient() as client:
        response = await client.get(url, headers=headers)
        response.raise_for_status()
        issues_data = response.json()
        
        issues = []
        for issue in issues_data:
            if "pull_request" not in issue: # Filter out PRs, which GitHub treats as issues
                issues.append(GitHubIssueContext(
                    issue_id=issue["number"],
                    title=issue["title"],
                    body=issue["body"] or "No description provided.",
                    state=issue["state"],
                    labels=[label["name"] for label in issue["labels"]],
                    url=issue["html_url"]
                ))
        return issues

async def fetch_github_prs(repo_owner: str, repo_name: str, state: str = "open") -> List[GitHubPRContext]:
    headers = {"Authorization": f"token {GITHUB_TOKEN}", "Accept": "application/vnd.github.v3+json"}
    url = f"{GITHUB_API_BASE_URL}/repos/{repo_owner}/{repo_name}/pulls?state={state}"
    async with httpx.AsyncClient() as client:
        response = await client.get(url, headers=headers)
        response.raise_for_status()
        prs_data = response.json()
        
        prs = []
        for pr in prs_data:
            # For simplicity, we'll just get files_changed count here, full details would need another API call
            # For example: await client.get(pr["url"] + "/files", headers=headers)
            prs.append(GitHubPRContext(
                pr_id=pr["number"],
                title=pr["title"],
                body=pr["body"] or "No description provided.",
                state=pr["state"],
                url=pr["html_url"],
                files_changed=[] # Placeholder, could be populated with a follow-up API call
            ))
        return prs

# External Link: Refer to the official GitHub API documentation for more details:
# GitHub REST API Documentation

Detailed high-tech concept illustration of a server rack with glowing data streams and connections, representing secure

Model Adapter (Simulated)

For this guide, we'll use a simulated model. In a real-world scenario, this would be where you interface with OpenAI's API, Anthropic's API, or a local LLM via libraries like llama-cpp-python. The key is that the model adapter takes the MCP context and formats it into the specific prompt structure required by the LLM.

# app/model_adapter.py
from typing import List, Dict, Any
from app.models import ContextItem

class SimulatedModelAdapter:
    def __init__(self, model_name: str = "Simulated-GPT-4"): # Or OpenAI/Anthropic
        self.model_name = model_name

    async def invoke(self, context: List[ContextItem], user_input: str, model_config: Dict[str, Any]) -> str:
        # In a real scenario, this would involve calling an external LLM API
        # and constructing a prompt based on the context.
        
        # Simulate context processing:
        context_summary = ""
        for item in context:
            if item.type == "github_issue":
                context_summary += f"Issue {item.issue_id} ({item.state}): {item.title}. Body: {item.body[:100]}...\n"
            elif item.type == "github_pr":
                context_summary += f"PR {item.pr_id} ({item.state}): {item.title}. Body: {item.body[:100]}...\n"
            # Add other context types here
            
        if not context_summary:
            context_summary = "No specific context provided."

        response_prefix = f"({self.model_name}) Acknowledged context:\n{context_summary}\n"
        
        # Simulate model response based on user input
        if "summarize" in user_input.lower() and "github" in context_summary.lower():
            return response_prefix + "Here's a summary of the GitHub items: Multiple open issues and PRs related to project development. Focus is on addressing bug fixes and new features."
        elif "hello" in user_input.lower():
            return response_prefix + "Hello there! How can I assist you with your project context today?"
        else:
            return response_prefix + f"Received your input: '{user_input}'. Please provide more specific instructions or context for a detailed response."

simulated_model_adapter = SimulatedModelAdapter()

FastAPI Application

Now, let's wire everything up in our main FastAPI application.

# app/main.py
from fastapi import FastAPI, HTTPException
from app.session_manager import session_manager
from app.models import SessionCreateResponse, InvokeRequest, InvokeResponse, ContextItem, GitHubIssueContext, GitHubPRContext
from app.github_integration import fetch_github_issues, fetch_github_prs
from app.model_adapter import simulated_model_adapter
from typing import List

app = FastAPI(title="Python MCP GitHub Context Server")

@app.post("/mcp/session/create", response_model=SessionCreateResponse, summary="Create a new MCP session")
async def create_mcp_session():
    session = session_manager.create_session()
    return SessionCreateResponse(session_id=session.session_id)

@app.post("/mcp/session/{session_id}/context", status_code=204, summary="Add or update context for an MCP session")
async def add_context_to_session(session_id: str, context_items: List[ContextItem]):
    if not session_manager.get_session(session_id):
        raise HTTPException(status_code=404, detail="Session not found")
    session_manager.update_session_context(session_id, context_items)
    # In a real system, you might trigger a model re-indexing of context here
    return

@app.post("/mcp/session/{session_id}/github_context", status_code=204, summary="Fetch and add GitHub context")
async def add_github_context_to_session(session_id: str, repo_owner: str, repo_name: str, fetch_issues: bool = True, fetch_prs: bool = True):
    if not session_manager.get_session(session_id):
        raise HTTPException(status_code=404, detail="Session not found")
    
    context_items = []
    if fetch_issues:
        issues = await fetch_github_issues(repo_owner, repo_name)
        context_items.extend(issues)
    if fetch_prs:
        prs = await fetch_github_prs(repo_owner, repo_name)
        context_items.extend(prs)

    if context_items:
        session_manager.update_session_context(session_id, context_items)
    return

@app.post("/mcp/session/{session_id}/invoke", response_model=InvokeResponse, summary="Invoke AI model with session context")
async def invoke_model(session_id: str, request: InvokeRequest):
    session = session_manager.get_session(session_id)
    if not session:
        raise HTTPException(status_code=404, detail="Session not found")

    model_response = await simulated_model_adapter.invoke(
        context=session.context.items,
        user_input=request.user_input,
        model_config=request.model_config
    )
    
    # Store interaction history
    session_manager.add_history_entry(
        session_id, 
        {"user_input": request.user_input, "model_response": model_response}
    )

    return InvokeResponse(model_response=model_response, session_id=session_id)

@app.get("/mcp/session/{session_id}/status", summary="Get MCP session status and history")
async def get_session_status(session_id: str):
    session = session_manager.get_session(session_id)
    if not session:
        raise HTTPException(status_code=404, detail="Session not found")
    return session

# External Link: For more on FastAPI, check its official documentation:
# FastAPI Documentation

Scalability, Performance, and Deployment

Running an MCP server, especially one that interacts with external APIs and potentially large language models, demands careful consideration for scalability and performance. FastAPI, with its asynchronous nature, is a great start. Here are a few points I've learned from deploying similar systems:

Asynchronous I/O: FastAPI and httpx are built on asyncio, which allows your server to handle many concurrent requests without blocking, crucial when waiting for GitHub or LLM API responses.
Worker Processes: For CPU-bound tasks (like complex context summarization if you implement it), run multiple Uvicorn worker processes. Gunicorn is excellent for managing these workers.
Caching: Cache frequently accessed GitHub data or LLM responses (if appropriate) using Redis. This significantly reduces latency and API call costs.
Database for Sessions: Replace the in-memory SessionManager with a persistent solution like MongoDB (for flexible schema) or Redis (for high-speed key-value session storage).
Containerization: Dockerize your application. This simplifies deployment and ensures consistency across environments.
Orchestration: For high availability and automatic scaling, deploy on Kubernetes or use managed container services.

For deployment, especially when you need dedicated resources, I highly recommend checking out cloud VPS providers like Vultr. Their high-performance compute instances offer fantastic value for money, and I've personally used them for various FastAPI and Ktor deployments, experiencing reliable performance and easy scaling. It's a solid choice for hosting your MCP server with optimal control.

When it comes to the technical depth required to truly optimize your backend and understand how underlying systems work, having a strong grasp of operating system internals is invaluable. It’s the same kind of deep dive I cover in my post Demystifying Android OS Internals, though focused on a different domain, the principles of system efficiency apply universally.

Example: Scaling with Uvicorn and Gunicorn

# To run with Uvicorn (for development)
uvicorn app.main:app --reload --port 8000

# To run with Gunicorn and Uvicorn workers (for production)
gunicorn app.main:app --workers 4 --worker-class uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000

Security Best Practices

Any service interacting with sensitive data or external APIs needs robust security:

Environment Variables: As mentioned, store API keys (GitHub PAT, LLM keys) in environment variables, never in code.
Input Validation: FastAPI's Pydantic models handle much of this, but always be wary of malicious inputs, especially for free-text fields.
Rate Limiting: Implement rate limiting on your API endpoints to prevent abuse and protect external API quotas. Fastapi-Limiter is a good library for this.
HTTPS: Always deploy your server behind HTTPS using a reverse proxy like Nginx or Caddy.
Least Privilege: Ensure your GitHub PAT only has the minimum necessary permissions.

Comparative Analysis: Traditional Context vs. MCP

Let's look at how MCP streamlines context management compared to traditional approaches, particularly when integrating external APIs like GitHub.

Feature	Traditional Ad-hoc Context	Model Context Protocol (MCP)
Context Structure	Flat strings, unparsed blobs, manual concatenation.	Structured JSON objects, typed context items (e.g., GitHubIssueContext, ChatMessage).
Token Efficiency	Often redundant, re-sending full history, prone to hitting limits.	Intelligent context selection, differential updates, reduced token usage.
Developer Experience	Fragile, hard to debug, boilerplate context manipulation.	Standardized API, Pydantic validation, clear separation of concerns.
Scalability	Difficult to scale due to large prompt sizes and parsing overhead.	Designed for scale, optimized for context retrieval and update.
Model Adaptability	Requires custom prompt engineering for each model.	Abstracts model-specific prompt details via adapters.
GitHub Integration	Raw API responses directly into prompts, manual filtering.	Structured GitHub context items, easier for model to 'understand' specific fields.

#python #ai #mcp #developertools

Need Help with Custom APIs or Backend Systems?

I build robust, secure, and scalable backend services, databases, and microservices using FastAPI, Ktor, Node.js, and MongoDB. Let's build your server infrastructure!

Written by

Hazrat Ummar Shaikh

Android Developer with 4+ years of experience. Built production Android apps, Ktor backends, Discord bots, and SaaS products using Kotlin, Python, and MongoDB. Passionate about building robust systems and writing clean code.

Build a Python MCP Server: GitHub API Context Management