Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/microsoft/mcp-for-beginners/llms.txt

Use this file to discover all available pages before exploring further.

Sampling lets an MCP Server ask the MCP Client to call an LLM on its behalf. This is useful when your server needs AI-generated content (like a summary or analysis) but shouldn’t — or can’t — call an LLM directly. The client, which already has access to an LLM, handles the request and returns the result.

When to use sampling

A concrete example: a blog post creation tool that also needs a generated abstract. The server has all the content, but the LLM lives on the client side.
User → MCP Client: "Author blog post"

MCP Client → MCP Server: Tool call (create_blog)

MCP Server → MCP Client: sampling/createMessage (create summary)

MCP Client → LLM: Generate abstract

LLM → MCP Client: Abstract text

MCP Client → MCP Server: Sampling response (abstract)

MCP Server → MCP Client: Complete blog post (draft + abstract)

MCP Client → User: Blog post ready

The sampling request

The server sends a sampling/createMessage JSON-RPC request to the client:
{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "sampling/createMessage",
  "params": {
    "messages": [
      {
        "role": "user",
        "content": {
          "type": "text",
          "text": "Create a blog post summary of the following blog post: <BLOG POST>"
        }
      }
    ],
    "modelPreferences": {
      "hints": [
        { "name": "claude-3-sonnet" }
      ],
      "intelligencePriority": 0.8,
      "speedPriority": 0.5
    },
    "systemPrompt": "You are a helpful assistant.",
    "maxTokens": 100
  }
}

Key fields

FieldDescription
messagesThe conversation messages to send to the LLM
modelPreferences.hintsPreferred models (the client may use a different one)
intelligencePriority0–1 scale; higher = prefer smarter model
speedPriority0–1 scale; higher = prefer faster model
systemPromptSystem instruction for the LLM
maxTokensRecommended token limit for the response
Model preferences are recommendations only. The user (via the client) can choose a different model. Your server code must handle responses from any model.

The sampling response

After the client calls the LLM, it sends the result back to the server:
{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "role": "assistant",
    "content": {
      "type": "text",
      "text": "Here's your abstract: <ABSTRACT>"
    },
    "model": "gpt-5",
    "stopReason": "endTurn"
  }
}
Note: the model in the response may differ from what you requested — the user chose gpt-5 instead of claude-3-sonnet.

Message content types

Sampling messages support text, images, and audio:
{
  "type": "text",
  "text": "The message content"
}

Implementing a sampling server (Python)

Here’s a complete blog post tool that uses sampling to generate an abstract:
from mcp.server.fastmcp import Context, FastMCP
from mcp.server.session import ServerSession
from mcp.types import SamplingMessage, TextContent
from pydantic import BaseModel
import json

mcp = FastMCP("Blog post generator")
posts = []

class BlogPost(BaseModel):
    id: int
    title: str
    content: str
    abstract: str = ""

@mcp.tool()
async def create_blog(
    title: str,
    content: str,
    ctx: Context[ServerSession, None]
) -> str:
    """Create a blog post and generate a summary using sampling."""

    # Step 1: Create the blog post object
    post = BlogPost(
        id=len(posts) + 1,
        title=title,
        content=content,
        abstract=""
    )

    # Step 2: Send a sampling request to the client
    prompt = f"Create an abstract of the following blog post: title: {title} and draft: {content}"

    result = await ctx.session.create_message(
        messages=[
            SamplingMessage(
                role="user",
                content=TextContent(type="text", text=prompt),
            )
        ],
        max_tokens=100,
    )

    # Step 3: Use the LLM response as the abstract
    post.abstract = result.content.text
    posts.append(post)

    # Step 4: Return the complete post
    return json.dumps({
        "id": post.id,
        "title": post.title,
        "abstract": post.abstract
    })

Enabling sampling in the client

If you are also building the client (not just the server), declare sampling support in client capabilities:
{
  "capabilities": {
    "sampling": {}
  }
}
If you are only building the MCP Server, you don’t need to configure anything on the client side — the host application (Claude Desktop, VS Code, etc.) handles sampling responses automatically.

Key takeaways

  • Sampling lets a server delegate LLM calls to the client — the server sends a sampling/createMessage request and the client calls the LLM and returns the result.
  • Model preferences are recommendations; the client and user choose the actual model used.
  • Sampling messages support text, image, and audio content types.
  • The server uses ctx.session.create_message() (Python) to issue sampling requests from within a tool.
  • This pattern is only available with the low-level server API or via the Context object in FastMCP.