LLM Clients API¶
kavalai.llm_clients provides a unified, observable interface over LLM and
embedding providers. Every call returns a ModelCallStat with
token usage and timing, and structured output is validated against a Pydantic
response_model.
Run in the browser¶
The browser/ provider runs a model entirely client-side over WebGPU — no API
key, no server, no CORS. The same make_client() /
make_embedding_client() factories you use on the server return a
BrowserLLMClient / BrowserEmbeddingClient,
so your code is identical apart from the provider/model string. The two
snippets below have a Run in browser ▶ button (the model id comes from the
panel’s dropdown):
from kavalai import make_client
client = make_client(f"browser/{KAVAL_BROWSER_MODEL}")
colours = await client.prompt("Name the three primary colours, comma-separated.")
print(colours)
Embeddings work the same way. Embedding models are distinct from chat models;
KAVAL_BROWSER_EMBED_MODEL is a small, full-precision Snowflake Arctic model:
from kavalai import make_embedding_client
client = make_embedding_client(f"browser/{KAVAL_BROWSER_EMBED_MODEL}")
texts = [
"Tallinn is the capital of Estonia.",
"Estonia's capital city is Tallinn.",
"I had pasta for dinner last night.",
]
vectors, stats = await client.compute_embeddings(texts, normalize=True)
print(f"{len(vectors)} vectors of dimension {len(vectors[0])}")
# Vectors are L2-normalised, so cosine similarity is just their dot product.
def similarity(a, b):
return sum(x * y for x, y in zip(a, b))
print(f"sim(0, 1) = {similarity(vectors[0], vectors[1]):.3f} # same meaning")
print(f"sim(0, 2) = {similarity(vectors[0], vectors[2]):.3f} # unrelated")
Note
browser/ models need a WebGPU-capable browser (recent Chrome/Edge, or
Firefox with dom.webgpu.enabled). The model downloads on first use and is
cached by the browser. Outside the browser, use an openai/, gemini/ or
ollama/ model instead.
Base client and models¶
Copyright 2026 OÜ KAVAL AI (registry code 17393877)
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
-
class kavalai.llm_clients.base_client.LlmClientParameters(*, temperature: float | None =
1.0, top_p: float | None =0.2, reasoning_effort: str | None =None, service_tier: str | None =None, timeout_seconds: float | None =30.0)[source]¶ Bases:
BaseModel-
model_config : ClassVar[ConfigDict] =
{}¶ Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
-
model_config : ClassVar[ConfigDict] =
-
class kavalai.llm_clients.base_client.ChatMessage(*, role: str | None =
None, type: str | None =None, content: str | None =None)[source]¶ Bases:
BaseModelStandard chat completion message.
-
model_config : ClassVar[ConfigDict] =
{}¶ Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
-
model_config : ClassVar[ConfigDict] =
- class kavalai.llm_clients.base_client.ChatHistory(*, messages: list[ChatMessage])[source]¶
Bases:
BaseModel- messages : list[ChatMessage]¶
-
model_config : ClassVar[ConfigDict] =
{}¶ Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
-
class kavalai.llm_clients.base_client.ModelCallStat(*, call_type: 'llm' | 'embedding', model: str | None =
None, request_data: str | None =None, response_data: str | None =None, response_code: int | None =None, prompt_tokens: int | None =None, completion_tokens: int | None =None, total_tokens: int | None =None, batch_size: int | None =None, duration_seconds: float | None =None)[source]¶ Bases:
BaseModel-
model_config : ClassVar[ConfigDict] =
{}¶ Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
-
model_config : ClassVar[ConfigDict] =
- class kavalai.llm_clients.base_client.ModelStatsReceiver[source]¶
Bases:
object- receive_model_stats(stats: ModelCallStat)[source]¶
-
class kavalai.llm_clients.base_client.ModelStatsLogger(format_str: str | None =
None)[source]¶ Bases:
ModelStatsReceiverLogs model call statistics using a configurable format.
- receive_model_stats(stats: ModelCallStat)[source]¶
-
class kavalai.llm_clients.base_client.BaseLlmClient(llm_client_parameters: LlmClientParameters | None =
None, model_stats_receiver: ModelStatsReceiver | None =None)[source]¶ Bases:
object-
async stream_chat_completions(*, chat_history: ChatHistory, response_model: type[BaseModel] | None =
None) Streamer[source]¶ Execute a chat completion and return a Streamer.
-
async chat_completions(*, chat_history: ChatHistory, response_model: type[BaseModel] | None =
None)[source]¶
-
async stream_prompt(system_message: str, response_model: type[BaseModel] | None =
None) Streamer[source]¶
-
async prompt(system_message: str, response_model: type[BaseModel] | None =
None)[source]¶
-
async stream_chat_completions(*, chat_history: ChatHistory, response_model: type[BaseModel] | None =
- exception kavalai.llm_clients.base_client.LlmClientException[source]¶
Bases:
RuntimeError
Provider clients¶
Copyright 2026 OÜ KAVAL AI (registry code 17393877)
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
-
class kavalai.llm_clients.openai_client.OpenAIClient(model: str, llm_client_parameters: LlmClientParameters | None =
None, model_stats_receiver: ModelStatsReceiver | None =None, api_key: str | None =None, base_url: str | None =None)[source]¶ Bases:
BaseLlmClientOpenAI LLM client implementation using the Responses API and Streamer.
Copyright 2026 OÜ KAVAL AI (registry code 17393877)
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
-
class kavalai.llm_clients.gemini_client.GeminiClient(model: str, llm_client_parameters: LlmClientParameters | None =
None, model_stats_receiver: ModelStatsReceiver | None =None, api_key: str | None =None)[source]¶ Bases:
BaseLlmClientGemini LLM client implementation using the Streamer.
- kavalai.llm_clients.gemini_client.convert_messages(messages: list[dict[str, Any]]) tuple[str | None, list[Content]][source]¶
- kavalai.llm_clients.gemini_client.remove_additional_properties(schema: dict[str, Any]) None[source]¶
Recursively remove ‘additionalProperties’ from a JSON schema. Gemini’s API doesn’t support this field.
Copyright 2026 OÜ KAVAL AI (registry code 17393877)
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
-
class kavalai.llm_clients.ollama_client.OllamaClient(model: str, llm_client_parameters: LlmClientParameters | None =
None, model_stats_receiver: ModelStatsReceiver | None =None, host: str | None =None)[source]¶ Bases:
BaseLlmClientOllama LLM client implementation using the Streamer.
Copyright 2026 OÜ KAVAL AI (registry code 17393877)
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
- kavalai.llm_clients.browser_client.get_browser_bridge()[source]¶
Return the page’s JS bridge object (
window.kavalBrowserLLM).Shared by the browser LLM client (
.chat) and the browser embedding client (.embed). Raises a helpfulLlmClientExceptionwhen not running under Pyodide, or when the page has not loaded a bridge.
-
class kavalai.llm_clients.browser_client.BrowserLLMClient(model: str, llm_client_parameters: LlmClientParameters | None =
None, model_stats_receiver: ModelStatsReceiver | None =None)[source]¶ Bases:
BaseLlmClientLLM client that runs entirely in the browser, with no network calls.
Inference happens inside the page through a tiny JavaScript bridge exposed on
window.kavalBrowserLLM, typically backed by a WebGPU engine such as WebLLM. This makes Kaval.AI’s LLM nodes usable inside Pyodide with no API key, no provider account and no CORS constraints — the model is downloaded once and cached by the browser.Use it through
make_client("browser/<model-id>")or construct it directly.<model-id>is passed verbatim to the bridge (e.g. a WebLLM model id likeLlama-3.2-1B-Instruct-q4f32_1-MLC).The bridge contract is a single async function:
window.kavalBrowserLLM.chat(requestJson) -> Promise<resultJson>where
requestJsonis a JSON string of{model, messages, temperature, top_p, response_format?}andresultJsonis a JSON string of either{content, usage}or{error}. Exchanging plain JSON strings keeps the Python<->JS boundary free of proxy-conversion surprises.
Embeddings¶
Copyright 2026 OÜ KAVAL AI (registry code 17393877)
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
- class kavalai.llm_clients.embeddings.BaseEmbeddingClient(model: str)[source]¶
Bases:
objectCommon interface for v2 embedding clients.
The model name is bound at construction (the factory splits the
provider/modelstring), socompute_embeddingsonly takes the texts. Implementations return the embeddings plus a database-readyModelCallStat(the ORM row) so callers such asRagServicecan persist usage directly.
-
class kavalai.llm_clients.embeddings.OpenAIEmbeddingClient(model: str, api_key: str | None =
None, base_url: str | None =None, timeout: float =30.0)[source]¶ Bases:
BaseEmbeddingClientOpenAI embeddings (e.g.
text-embedding-3-small).
-
class kavalai.llm_clients.embeddings.GeminiEmbeddingClient(model: str, api_key: str | None =
None)[source]¶ Bases:
BaseEmbeddingClientGoogle Gemini embeddings.
-
class kavalai.llm_clients.embeddings.OllamaEmbeddingClient(model: str, host: str | None =
None, timeout: float =30.0)[source]¶ Bases:
BaseEmbeddingClientOllama (local) embeddings.
-
class kavalai.llm_clients.embeddings.FastEmbedClient(model: str, cache_dir: str | None =
None, threads: int | None =None, **kwargs)[source]¶ Bases:
BaseEmbeddingClientLocal embeddings via FastEmbed / ONNX Runtime (no API key).
- class kavalai.llm_clients.embeddings.BrowserEmbeddingClient(model: str)[source]¶
Bases:
BaseEmbeddingClientIn-browser embeddings via the WebLLM bridge (Pyodide only, no API key).
Mirrors
BrowserLLMClient: inference happens inside the page throughwindow.kavalBrowserLLM, here via its asyncembedfunction:window.kavalBrowserLLM.embed(requestJson) -> Promise<resultJson>where
requestJsonis a JSON string of{model, input}(inputis the list of texts) andresultJsonis a JSON string of either{embeddings, usage}or{error}. The model is downloaded once and cached by the browser — no API key, no provider account, no CORS.Use it through
make_embedding_client("browser/<model-id>");<model-id>is passed verbatim to the bridge (e.g. a WebLLM embedding id likesnowflake-arctic-embed-m-q0f32-MLC-b4).
- kavalai.llm_clients.embeddings.make_embedding_client(model: str) BaseEmbeddingClient[source]¶
Construct a v2 embedding client from a
provider/modelstring.Supported providers:
openai,gemini,ollama,fastembed,browser. The provider is split off and the remainder (which may itself contain slashes, e.g.fastembed/BAAI/bge-small-en-v1.5) is the model name. Thebrowserprovider runs entirely client-side via a WebLLM bridge (Pyodide only) and needs no API key.
Streaming¶
Copyright 2026 OÜ KAVAL AI (registry code 17393877)
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
- exception kavalai.llm_clients.streamer.StreamerTimeoutException(names: list[str], timeout_seconds: float)[source]¶
Bases:
ExceptionRaised when no stream chunk arrives within the configured timeout.
Reported by
Streamerwhile waiting on its queue when atimeout_secondsis set;nameslists the streamers still active when the timeout elapsed.
-
class kavalai.llm_clients.streamer.StreamContent(*, type: str, name: str, value: str | None =
None)[source]¶ Bases:
BaseModelStreamContentrepresents a streamed message from aStreamer.- Variables:¶
-
model_config : ClassVar[ConfigDict] =
{}¶ Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
-
class kavalai.llm_clients.streamer.ValueStreamer(name: str, queue: Queue, response_model: type[BaseModel] | None =
None, stream_delta: bool =False, on_complete_callback: callable | None =None)[source]¶ Bases:
objectA helper class to manage and push streaming content to an asyncio queue.
- Variables:¶
- get_safe_value() str[source]¶
Safely parse and return the buffered content as JSON string if response_model is set, otherwise return as string.
-
class kavalai.llm_clients.streamer.Streamer(stream_delta: bool =
False, timeout_seconds: float | None =None)[source]¶ Bases:
object- property queue : Queue¶
-
get_value_streamer(name: str, stream_delta: bool | None =
None, response_model: type[BaseModel] | None =None) ValueStreamer[source]¶