Kaval.AI is an opinionated, elegant, production-grade library for building LLM-powered workflows, chatbots and agents, and connecting them to databases and tooling — with a focus on observability and debuggability.

Kaval.AI uses Pydantic for every workflow input and output, and for all intermediate steps including tool calls.

from pydantic import BaseModel

class Message(BaseModel):
    message: str

class Reply(BaseModel):
    agent_response: str
    choices: list[str]

Using structured data types makes developing agentic AI workflows pleasant and predictable. The simplest way to create a workflow is the WorkflowBuilder:

from kavalai.workflow import WorkflowBuilder, InMemoryDataStorage

# A short example or two helps a tiny model answer well and fill the choices
# in the right shape; the structured output schema enforces the format.
prompt = """You are a friendly, concise chatbot. Reply to the user in
agent_response, and suggest up to 3 short quick-reply choices the user might
tap next. Use earlier messages in the conversation for context.

Example:
user: hi
agent_response: Hey there! What can I help you with today?
choices: ["What can you do?", "Tell me a joke", "Give me an idea"]
"""

workflow = (
    WorkflowBuilder("Chatbot", llm_model="browser/Llama-3.2-1B-Instruct-q4f32_1-MLC")
    .data_model("input", Message)
    .data_model("output", Reply)
    .start("reply")
    .llm(
        "reply",
        prompt=prompt,
        inputs={"message": "input"},
        output="output",
        next="end",
    )
    .end()
    .build_engine(storage=InMemoryDataStorage())
)

Let’s go over the code step-by-step.

WorkflowBuilder("Chatbot", llm_model="browser/Llama-3.2-1B-Instruct-q4f32_1-MLC")

This creates the builder for a workflow named Chatbot. The llm_model argument is the default model every LLM node uses unless it overrides it. Kaval.AI supports several providers — OpenAI, Gemini, Ollama and an in-browser WebGPU provider; see LLM clients.

Here we use the browser provider so a small model runs right in this page with no API key, backed by the in-browser BrowserLLMClient (see LLM Clients API). Llama-3.2-1B-Instruct-q4f32_1-MLC is a tiny model that downloads on first use; the live playground below also lets you pick a different one from its dropdown.

.data_model("input", Message)
.data_model("output", Reply)

data_model registers a Pydantic model as a workflow data type. input and output are special: they are the workflow’s overall input and output types. Declaring them explicitly is what lets the runner validate every value and ask the model for exactly the right shape.

.start("reply")

This names the first node to run — here, the reply node defined next.

.llm(
    "reply",
    prompt=prompt,
    inputs={"message": "input"},
    output="output",
    next="end",
)

This adds an LLM node named reply. inputs maps a local name to a value from the run context: {"message": "input"} takes the workflow’s input and hands it to the prompt under the name message, so the model sees the message text next to your instructions. output="output" stores the model’s answer — validated against Reply — under output, and next="end" says which node runs after this one.

.build_engine(storage=InMemoryDataStorage())

Finally, build_engine validates the graph and returns a ready-to-run WorkflowEngine. Passing storage gives the bot memory: LLM nodes read the conversation history by default (use_history=True), so each turn sees what was said before. The chat keeps one session and InMemoryDataStorage holds that history for the page’s lifetime; in production you would point it at a database instead.

Here is that chatbot, running entirely in your browser. The first run takes a moment to download the model (or load it from cache), and because the model is tiny its answers will vary — but it remembers the conversation, and its reply is a structured Reply whose choices become the quick-reply buttons.

Under the hood that chatbot is a tiny workflow — one LLM node between start and end:

The chatbot workflow: start → reply (an LLM node) → end.

This is just a small glimpse of what Kaval.AI can do. Check out the tutorials and examples — they cover a wide range of topics for building production-grade agentic workflows.

Get started¶

Kaval.AI is more than a workflow engine. It ships native LLM clients for OpenAI, Gemini and Ollama behind one streaming interface (plus the in-browser WebGPU client you just used); a FunctionKernel that exposes Python, REST and MCP tools to your agents through a single validated interface; a RagService for indexing and querying embeddings, so agents can answer from your own documents (retrieval-augmented generation); and a backoffice UI for configuring agents and monitoring conversations, runs, tasks, token usage and cost. The guides, tutorials and API reference below cover each of these in depth.

Installation

Learn

Operate

Using the Backoffice UI

Reference

API Reference

Get started¶

Indices and tables¶