Release Notes | liteLLM

[Pre-Release] v1.79.0-stable - Search APIs

October 26, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaff

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.79.0.rc.1

pip install litellm
pip install litellm==1.79.0

Major Changes

Cohere models will now be routed to Cohere v2 API by default - PR #15722

Key Highlights

Search APIs - Native /v1/search endpoint with support for Perplexity, Tavily, Parallel AI, Exa AI, DataforSEO, and Google PSE with cost tracking
Vector Stores - Vertex AI Search API integration as vector store through LiteLLM with passthrough endpoint support
Guardrails Expansion - Apply guardrails across Responses API, Image Gen, Text completions, Audio transcriptions, Audio Speech, Rerank, and Anthropic Messages API via unified apply_guardrails function
New Guardrail Providers - Gray Swan, Dynamo AI, IBM Guardrails, Lasso Security v3, and Bedrock Guardrail apply_guardrail endpoint support
Video Generation API - Native support for OpenAI Sora-2 and Azure Sora-2 (Pro, Pro-High-Res) with cost tracking and logging support
Azure AI Speech (TTS) - Native Azure AI Speech integration with cost tracking for standard and HD voices

New Models / Updated Models

New Model Support

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)	Features
Bedrock	`anthropic.claude-3-7-sonnet-20240620-v1:0`	200K	$3.60	$18.00	Chat, reasoning, vision, function calling, prompt caching, computer use
Bedrock GovCloud	`us-gov-west-1/anthropic.claude-3-7-sonnet-20250219-v1:0`	200K	$3.60	$18.00	Chat, reasoning, vision, function calling, prompt caching, computer use
Vertex AI	`mistral-medium-3`	128K	$0.40	$2.00	Chat, function calling, tool choice
Vertex AI	`codestral-2`	128K	$0.30	$0.90	Chat, function calling, tool choice
Bedrock	`amazon.titan-image-generator-v1`	-	-	-	Image generation - $0.008/image, $0.01/premium image
Bedrock	`amazon.titan-image-generator-v2`	-	-	-	Image generation - $0.008/image, $0.01/premium image
OpenAI	`sora-2`	-	-	-	Video generation - $0.10/video/second
Azure	`sora-2`	-	-	-	Video generation - $0.10/video/second
Azure	`sora-2-pro`	-	-	-	Video generation - $0.30/video/second
Azure	`sora-2-pro-high-res`	-	-	-	Video generation - $0.50/video/second

Features

Anthropic
- Fix cache_control incorrectly applied to all content items instead of last item only - PR #15699
- Forward anthropic-beta headers to Bedrock, VertexAI - PR #15700
- Change max_tokens value to match max_output_tokens for claude sonnet - PR #15715
Bedrock
- Add AWS us-gov-west-1 Claude 3.7 Sonnet costs - PR #15775
- Fix the date for sonnet 3.7 in govcloud - PR #15800
- Use proper bedrock model name in health check - PR #15808
- Support for embeddings_by_type Response Format in Bedrock Cohere Embed v1 - PR #15707
- Add titan image generations with cost tracking - PR #15916
Gemini
- Add imageConfig parameter for gemini-2.5-flash-image - PR #15530
- Replace deprecated gemini-1.5-pro-preview-0514 - PR #15852
- Update vertex ai gemini costs - PR #15911
Ollama
- Set 'think' to False when reasoning effort is minimal/none/disable - PR #15763
- Handle parsing ollama chunk error - PR #15717
Vertex AI
- Add mistral medium 3 and Codestral 2 on vertex - PR #15887
Databricks
- Allow prompt caching to be used for Anthropic Claude on Databricks - PR #15801
Azure
- Add Azure AVA TTS integration - PR #15749
- Add Azure AVA (Speech AI) Cost Tracking - PR #15754
- Azure AI Speech - Ensure voice is mapped from request body to SSML body, allow sending role and style - PR #15810
- Add Azure support for video generation functionality (Sora-2) - PR #15901
OpenAI
- OpenAI videos refactoring - PR #15900
General
- Read from custom-llm-provider header - PR #15528

LLM API Endpoints

Features

Responses API
- Add gpt 4.1 pricing for response endpoint - PR #15593
- Fix Incorrect status value in responses api with gemini - PR #15753
- Simplify reasoning item handling for gpt-5-codex - PR #15815
- ErrorEvent ValidationError when OpenAI Responses API returns nested error structure - PR #15804
- Fix reasoning item ID auto-generation causing encrypted content verification errors - PR #15782
- Support tags in metadata - PR #15867
- Security: prevent User A from retrieving User B's response, if response.id is leaked - PR #15757
Batch API
- Add pre and post call for list batches - PR #15673
- Add function responsible to call precall - PR #15636
- Fix "User default_user_id does not have access to the object" when object not in db - PR #15873
OCR API
- Add Azure AI - OCR to docs - PR #15768
- Add mode + Health check support for OCR models - PR #15767
Search API
- Add def search() APIs for Web Search - Perplexity API - PR #15769
- Add Tavily Search API - PR #15770
- Add Parallel AI - Search API - PR #15772
- Add EXA AI Search API to LiteLLM - PR #15774
- Add /search endpoint on LiteLLM Gateway - PR #15780
- Add DataforSEO Search API - PR #15817
- Add Google PSE Search Provider - PR #15816
- Add cost tracking for Search API requests - Google PSE, Tavily, Parallel AI, Exa AI - PR #15821
- Backend: Allow storing configured Search APIs in DB - PR #15862
- Exa Search API - ensure request params are sent to Exa AI - PR #15855
Vector Stores
- Support Vertex AI Search API as vector store through LiteLLM - PR #15781
- Azure AI - Search Vector Stores - PR #15873
- VertexAI Search Vector Store - Passthrough endpoint support + Vector store search Cost tracking support - PR #15824
- Don't raise error if managed object is not found - PR #15873
- Show config.yaml vector stores on UI - PR #15873
- Cost tracking for search spend - PR #15859
Images API
- Pass user-defined headers and extra_headers to image-edit calls - PR #15811
Video Generation API
- Add Azure support for video generation functionality (Sora-2, Sora-2-Pro, Sora-2-Pro-High-Res) - PR #15901
- OpenAI video generation refactoring (Sora-2) - PR #15900
Bedrock /invoke
- Fix: Hooks broken on /bedrock passthrough due to missing metadata - PR #15849
Realtime API
- Fix: OpenAI Realtime API integration fails due to websockets.exceptions.PayloadTooBig error - PR #15751

Management Endpoints / UI

Features

Passthrough
- Set auth on passthrough endpoints, on the UI - PR #15778
- Fix pass-through endpoint budget enforcement bug - PR #15805
Organizations
- Allow org admins to create teams on UI - PR #15924
Search Tools
- UI - Search Tools, allow adding search tools on UI + testing search - PR #15871
- UI - Add logos for search providers - PR #15872
General
- Fix routing for custom server root path - PR #15701

Logging / Guardrail / Prompt Management Integrations

Features

OpenTelemetry
- Fix OpenTelemetry Logging functionality - PR #15645
- Fix issue where headers were not being split correctly - PR #15916
Sentry
- Add SENTRY_ENVIRONMENT configuration for Sentry integration - PR #15760
Helicone
- Fix JSON serialization error in Helicone logging by removing OpenTelemetry span from metadata - PR #15728
MLFlow
- Fix MLFlow tags - split request_tags into (key, val) if request_tag has colon - PR #15914
General
- Rename configured_cold_storage_logger to cold_storage_custom_logger - PR #15798

Guardrails

Gray Swan
- Add GraySwan Guardrails support - PR #15756
- Rename GraySwan to Gray Swan - PR #15771
Dynamo AI
- New Guardrail - Dynamo AI Guardrail - PR #15920
IBM Guardrails
- IBM Guardrails integration - PR #15924
Lasso Security
- Add v3 API Support - PR #12452
- Fixed lasso import config, redis cluster hash tags for test keys - PR #15917
Bedrock Guardrails
- Implement Bedrock Guardrail apply_guardrail endpoint support - PR #15892
General
- Guardrails - Responses API, Image Gen, Text completions, Audio transcriptions, Audio Speech, Rerank, Anthropic Messages API support via the unified apply_guardrails function - PR #15706

Spend Tracking, Budgets and Rate Limiting

Rate Limiting
- Support absolute RPM/TPM in priority_reservation - PR #15813
- Org level tpm/rpm limits + Team tpm/rpm validation when assigned to org - PR #15549

MCP Gateway

OAuth
- Auth Header Fix for MCP Tool Call - PR #15736
- Add response_type + PKCE parameters to OAuth authorization endpoint - PR #15720

Performance / Loadbalancing / Reliability improvements

Database
- Minimize the occurrence of deadlocks - PR #15281
Redis
- Apply max_connections configuration to Redis async client - PR #15797
Caching
- Add documentation for enable_caching_on_provider_specific_optional_params setting - PR #15885

Documentation Updates

Provider Documentation
- Update worker recommendation - PR #15702
- Fix the wrong request body in json mode doc - PR #15729
- Add details in docs - PR #15721
- Add responses api on openai docs - PR #15866
- Add OpenAI responses api - PR #15868

New Contributors

@tlecomte made their first contribution in PR #15528
@tomhaynes made their first contribution in PR #15645
@talalryz made their first contribution in PR #15720
@1vinodsingh1 made their first contribution in PR #15736
@nuernber made their first contribution in PR #15775
@Thomas-Mildner made their first contribution in PR #15760
@javiergarciapleo made their first contribution in PR #15721
@lshgdut made their first contribution in PR #15717
@kk-wangjifeng made their first contribution in PR #15530
@anthonyivn2 made their first contribution in PR #15801
@romanglo made their first contribution in PR #15707
@mythral made their first contribution in PR #15859
@mubashirosmani made their first contribution in PR #15866
@CAFxX made their first contribution in PR #15281
@reflection made their first contribution in PR #15914
@shadielfares made their first contribution in PR #15917

PR Count Summary

10/26/2025

New Models / Updated Models: 20
LLM API Endpoints: 29
Management Endpoints / UI: 5
Logging / Guardrail / Prompt Management Integrations: 10
Spend Tracking, Budgets and Rate Limiting: 2
MCP Gateway: 2
Performance / Loadbalancing / Reliability improvements: 3
Documentation Updates: 5

Full Changelog

View complete changelog on GitHub

v1.78.5-stable - Native OCR Support

October 18, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaff

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.78.5-stable

pip install litellm
pip install litellm==1.78.5

Key Highlights

Native OCR Endpoints - Native /v1/ocr endpoint support with cost tracking for Mistral OCR and Azure AI OCR
Global Vendor Discounts - Specify global vendor discount percentages for accurate cost tracking and reporting
Team Spending Reports - Team admins can now export detailed spending reports for their teams
Claude Haiku 4.5 - Day 0 support for Claude Haiku 4.5 across Bedrock, Vertex AI, and OpenRouter with 200K context window
GPT-5-Codex - Support for GPT-5-Codex via Responses API on OpenAI and Azure
Performance Improvements - Major router optimizations: O(1) model lookups, 10-100x faster shallow copy, 30-40% faster timing calls, and O(n) to O(1) hash generation

New Models / Updated Models

New Model Support

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)	Features
Anthropic	`claude-haiku-4-5`	200K	$1.00	$5.00	Chat, reasoning, vision, function calling, prompt caching, computer use
Anthropic	`claude-haiku-4-5-20251001`	200K	$1.00	$5.00	Chat, reasoning, vision, function calling, prompt caching, computer use
Bedrock	`anthropic.claude-haiku-4-5-20251001-v1:0`	200K	$1.00	$5.00	Chat, reasoning, vision, function calling, prompt caching
Bedrock	`global.anthropic.claude-haiku-4-5-20251001-v1:0`	200K	$1.00	$5.00	Chat, reasoning, vision, function calling, prompt caching
Bedrock	`jp.anthropic.claude-haiku-4-5-20251001-v1:0`	200K	$1.10	$5.50	Chat, reasoning, vision, function calling, prompt caching (JP Cross-Region)
Bedrock	`us.anthropic.claude-haiku-4-5-20251001-v1:0`	200K	$1.10	$5.50	Chat, reasoning, vision, function calling, prompt caching (US region)
Bedrock	`eu.anthropic.claude-haiku-4-5-20251001-v1:0`	200K	$1.10	$5.50	Chat, reasoning, vision, function calling, prompt caching (EU region)
Bedrock	`apac.anthropic.claude-haiku-4-5-20251001-v1:0`	200K	$1.10	$5.50	Chat, reasoning, vision, function calling, prompt caching (APAC region)
Bedrock	`au.anthropic.claude-haiku-4-5-20251001-v1:0`	200K	$1.10	$5.50	Chat, reasoning, vision, function calling, prompt caching (AU region)
Vertex AI	`vertex_ai/claude-haiku-4-5@20251001`	200K	$1.00	$5.00	Chat, reasoning, vision, function calling, prompt caching
OpenAI	`gpt-5`	272K	$1.25	$10.00	Chat, responses API, reasoning, vision, function calling, prompt caching
OpenAI	`gpt-5-codex`	272K	$1.25	$10.00	Responses API mode
Azure	`azure/gpt-5-codex`	272K	$1.25	$10.00	Responses API mode
Gemini	`gemini-2.5-flash-image`	32K	$0.30	$2.50	Image generation (GA - Nano Banana) - $0.039/image
ZhipuAI	`glm-4.6`	-	-	-	Chat completions

Features

OpenAI
- GPT-5 return reasoning content via /chat/completions + GPT-5-Codex working on Claude Code - PR #15441
Anthropic
- Reduce claude-4-sonnet max_output_tokens to 64k - PR #15409
- Added claude-haiku-4.5 - PR #15579
- Add support for thinking blocks and redacted thinking blocks in Anthropic v1/messages API - PR #15501
Bedrock
- Add anthropic.claude-haiku-4-5-20251001-v1:0 on Bedrock, VertexAI - PR #15581
- Add Claude Haiku 4.5 support for Bedrock global and US regions - PR #15650
- Add Claude Haiku 4.5 support for Bedrock Other regions - PR #15653
- Add JP Cross-Region Inference jp.anthropic.claude-haiku-4-5-20251001 - PR #15598
- Fix: bedrock-pricing-geo-inregion-cross-region / add Global Cross-Region Inference - PR #15685
- Fix: Support us-gov prefix for AWS GovCloud Bedrock models - PR #15626
- Fix GPT-OSS in Bedrock now supports streaming. Revert fake streaming - PR #15668
Gemini
- Feat(pricing): Add Gemini 2.5 Flash Image (Nano Banana) in GA - PR #15557
- Fix: Gemini 2.5 Flash Image should not have supports_web_search=true - PR #15642
- Remove penalty params as supported params for gemini preview model - PR #15503
Ollama
- Fix(ollama/chat): correctly map reasoning_effort to think in requests - PR #15465
OpenRouter
- Add anthropic/claude-sonnet-4.5 to OpenRouter cost map - PR #15472
- Prompt caching for anthropic models with OpenRouter - PR #15535
- Get completion cost directly from OpenRouter - PR #15448
- Fix OpenRouter Claude Opus 4 model naming - PR #15495
CometAPI
- Fix(cometapi): improve CometAPI provider support (embeddings, image generation, docs) - PR #15591
Lemonade
- Adding new models to the lemonade provider - PR #15554
Watson X
- Fix (pricing): Fix pricing for watsonx model family for various models - PR #15670
Vercel AI Gateway
- Add glm-4.6 model to pricing configuration - PR #15679
Vertex AI
- Add Vertex AI Discovery Engine Rerank Support - PR #15532

Bug Fixes

Anthropic
- Fix: Pricing for Claude Sonnet 4.5 in US regions is 10x too high - PR #15374
OpenRouter
- Change gpt-5-codex support in model_price json - PR #15540
Bedrock
- Fix filtering headers for signature calcs - PR #15590
General
- Add native reasoning and streaming support flag for gpt-5-codex - PR #15569

LLM API Endpoints

Features

Responses API
- Responses API - enable calling anthropic/gemini models in Responses API streaming in openai ruby sdk + DB - sanity check pending migrations before startup - PR #15432
- Add support for responses mode in health check - PR #15658
OCR API
- Feat: Add native litellm.ocr() functions - PR #15567
- Feat: Add /ocr route on LiteLLM AI Gateway - Adds support for native Mistral OCR calling - PR #15571
- Feat: Add Azure AI Mistral OCR Integration - PR #15572
- Feat: Native /ocr endpoint support - PR #15573
- Feat: Add Cost Tracking for /ocr endpoints - PR #15678
/generateContent
- Fix: GEMINI - CLI - add google_routes to llm_api_routes - PR #15500
- Fix Pydantic validation error for citationMetadata.citationSources in Google GenAI responses - PR #15592
Images API
- Fix: Dall-e-2 for Image Edits API - PR #15604
Bedrock Passthrough
- Feat: Allow calling /invoke, /converse routes through AI Gateway + models on config.yaml - PR #15618

Bugs

General
- Fix: Convert object to a correct type - PR #15634
- Bug Fix: Tags as metadata dicts were raising exceptions - PR #15625
- Add type hint to function_to_dict and fix typo - PR #15580

Management Endpoints / UI

Features

Virtual Keys
- Docs: Key Rotations - PR #15455
- Fix: UI - Key Max Budget Removal Error Fix - PR #15672
- litellm_Key Settings Max Budget Removal Error Fix - PR #15669
Teams
- Feat: Allow Team Admins to export a report of the team spending - PR #15542
Passthrough
- Feat: Passthrough - allow admin to give access to specific passthrough endpoints - PR #15401
SCIM v2
- Feat(scim_v2.py): if group.id doesn't exist, use external id + Passthrough - ensure updates and deletions persist across instances - PR #15276
SSO
- Feat: UI SSO - Add PKCE for OKTA SSO - PR #15608
- Fix: Separate OAuth M2M authentication from UI SSO + Handle Introspection endpoint for Oauth2 - PR #15667
- Fix/entraid app roles jwt claim clean - PR #15583

Logging / Guardrail / Prompt Management Integrations

Guardrails

General
- Fix apply_guardrail endpoint returning raw string instead of ApplyGuardrailResponse - PR #15436
- Fix: Ensure guardrail memory sync after database updates - PR #15633
- Feat: add guardrail for image generation - PR #15619
- Feat: Add Guardrails for /v1/messages and /v1/responses API - PR #15686
Pillar Security
- Feature: update pillar security integration to support no persistence mode in litellm proxy - PR #15599

Prompt Management

General
- Small fix code snippet custom_prompt_management.md - PR #15544

Spend Tracking, Budgets and Rate Limiting

Cost Tracking
- Feat: Cost Tracking - specify a global vendor discount for costs - PR #15546
- Feat: UI - Allow setting Provider Discounts on UI - PR #15550
Budgets
- Fix: improve budget clarity - PR #15682

Performance / Loadbalancing / Reliability improvements

Router Optimizations
- Perf(router): use shallow copy instead of deepcopy for model aliases - 10-100x faster than deepcopy on nested dict structures - PR #15576
- Perf(router): optimize string concatenation in hash generation - Improves time complexity from O(n²) to O(n) - PR #15575
- Perf(router): optimize model lookups with O(1) data structures - Replace O(n) scans with index map lookups - PR #15578
- Perf(router): optimize model lookups with O(1) index maps - Use model_id_to_deployment_index_map and model_name_to_deployment_indices for instant lookups - PR #15574
- Perf(router): optimize timing functions in completion hot path - Use time.perf_counter() for duration measurements and time.monotonic() for timeout calculations, providing 30-40% faster timing calls - PR #15617
SSL/TLS Performance
- Feat(ssl): add configurable ECDH curve for TLS performance - Configure via ssl_ecdh_curve setting to disable PQC on OpenSSL 3.x for better performance - PR #15617
Token Counter
- Fix(token-counter): extract model_info from deployment for custom_tokenizer - PR #15680
Performance Metrics
- Add: perf summary - PR #15458
CI/CD
- Fix: CI/CD - Missing env key & Linter type error - PR #15606

Documentation Updates

Provider Documentation
- Litellm docs 10 11 2025 - PR #15457
- Docs: add ecs deployment guide - PR #15468
- Docs: Update benchmark results - PR #15461
- Fix: add missing context to benchmark docs - PR #15688
General
- Fixed a few typos - PR #15267

New Contributors

@jlan-nl made their first contribution in PR #15374
@ImadSaddik made their first contribution in PR #15267
@huangyafei made their first contribution in PR #15472
@mubashir1osmani made their first contribution in PR #15468
@kowyo made their first contribution in PR #15465
@dhruvyad made their first contribution in PR #15448
@davizucon made their first contribution in PR #15544
@FelipeRodriguesGare made their first contribution in PR #15540
@ndrsfel made their first contribution in PR #15557
@shinharaguchi made their first contribution in PR #15598
@TensorNull made their first contribution in PR #15591
@TeddyAmkie made their first contribution in PR #15583
@aniketmaurya made their first contribution in PR #15580
@eddierichter-amd made their first contribution in PR #15554
@konekohana made their first contribution in PR #15535
@Classic298 made their first contribution in PR #15495
@afogel made their first contribution in PR #15599
@orolega made their first contribution in PR #15633
@LucasSugi made their first contribution in PR #15634
@uc4w6c made their first contribution in PR #15619
@Sameerlite made their first contribution in PR #15658
@yuneng-jiang made their first contribution in PR #15672
@Nikro made their first contribution in PR #15680

Full Changelog

View complete changelog on GitHub

v1.78.0-stable - MCP Gateway: Control Tool Access by Team, Key

October 11, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaff

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.78.0-stable

pip install litellm
pip install litellm==1.78.0.post1

Key Highlights

MCP Gateway - Control Tool Access by Team, Key - Control MCP tool access by team/key.
Performance Improvements - 70% Lower p99 Latency
GPT-5 Pro & GPT-Image-1-Mini - Day 0 support for OpenAI's GPT-5 Pro (400K context) and gpt-image-1-mini image generation
EnkryptAI Guardrails - New guardrail integration for content moderation
Tag-Based Budgets - Support for setting budgets based on request tags

MCP Gateway - Control Tool Access by Team, Key

Proxy admins can now control MCP tool access by team or key. This makes it easy to grant different teams selective access to tools from the same MCP server.

For example, you can now give your Engineering team access to list_repositories, create_issue, and search_code tools, while Sales only gets search_code and close_issue tools.

This makes it easier for Proxy Admins to govern MCP Tool Access.

Get Started

Performance - 70% Lower p99 Latency

This release cuts p99 latency by 70% on LiteLLM AI Gateway, making it even better for low-latency use cases.

These gains come from two key enhancements:

Reliable Sessions

Added support for shared sessions with aiohttp. The shared_session parameter is now consistently used across all calls, enabling connection pooling.

Faster Routing

A new model_name_to_deployment_indices hash map replaces O(n) list scans in _get_all_deployments() with O(1) hash lookups, boosting routing performance and scalability.

As a result, performance improved across all latency percentiles:

Median latency: 110 ms → 100 ms (−9.1%)
p95 latency: 440 ms → 150 ms (−65.9%)
p99 latency: 810 ms → 240 ms (−70.4%)
Average latency: 310 ms → 111.73 ms (−64.0%)

Test Setup

Locust

Concurrent users: 1,000
Ramp-up: 500

System Specs

Database was used
CPU: 4 vCPUs
Memory: 8 GB RAM
LiteLLM Workers: 4
Instances: 4

Configuration (config.yaml)

View the complete configuration: gist.github.com/AlexsanderHamir/config.yaml

Load Script (no_cache_hits.py)

View the complete load testing script: gist.github.com/AlexsanderHamir/no_cache_hits.py

New Models / Updated Models

New Model Support

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)	Features
OpenAI	`gpt-5-pro`	400K	$15.00	$120.00	Responses API, reasoning, vision, function calling, prompt caching, web search
OpenAI	`gpt-5-pro-2025-10-06`	400K	$15.00	$120.00	Responses API, reasoning, vision, function calling, prompt caching, web search
OpenAI	`gpt-image-1-mini`	-	$2.00/img	-	Image generation and editing
OpenAI	`gpt-realtime-mini`	128K	$0.60	$2.40	Realtime audio, function calling
Azure AI	`azure_ai/Phi-4-mini-reasoning`	131K	$0.08	$0.32	Function calling
Azure AI	`azure_ai/Phi-4-reasoning`	32K	$0.125	$0.50	Function calling, reasoning
Azure AI	`azure_ai/MAI-DS-R1`	128K	$1.35	$5.40	Reasoning, function calling
Bedrock	`au.anthropic.claude-sonnet-4-5-20250929-v1:0`	200K	$3.30	$16.50	Chat, reasoning, vision, function calling, prompt caching
Bedrock	`global.anthropic.claude-sonnet-4-5-20250929-v1:0`	200K	$3.00	$15.00	Chat, reasoning, vision, function calling, prompt caching
Bedrock	`global.anthropic.claude-sonnet-4-20250514-v1:0`	1M	$3.00	$15.00	Chat, reasoning, vision, function calling, prompt caching
Bedrock	`cohere.embed-v4:0`	128K	$0.12	-	Embeddings, image input support
OCI	`oci/cohere.command-latest`	128K	$1.56	$1.56	Function calling
OCI	`oci/cohere.command-a-03-2025`	256K	$1.56	$1.56	Function calling
OCI	`oci/cohere.command-plus-latest`	128K	$1.56	$1.56	Function calling
Together AI	`together_ai/moonshotai/Kimi-K2-Instruct-0905`	262K	$1.00	$3.00	Function calling
Together AI	`together_ai/Qwen/Qwen3-Next-80B-A3B-Instruct`	262K	$0.15	$1.50	Function calling
Together AI	`together_ai/Qwen/Qwen3-Next-80B-A3B-Thinking`	262K	$0.15	$1.50	Function calling
Vertex AI	MedGemma models	Varies	Varies	Varies	Medical-focused Gemma models on custom endpoints
Watson X	27 new foundation models	Varies	Varies	Varies	Granite, Llama, Mistral families

Features

OpenAI
- Add GPT-5 Pro model configuration and documentation - PR #15258
- Add stop parameter to non-supported params for GPT-5 - PR #15244
- Day 0 Support, Add gpt-image-1-mini - PR #15259
- Add gpt-realtime-mini support - PR #15283
- Add gpt-5-pro-2025-10-06 to model costs - PR #15344
- Minimal fix: gpt5 models should not go on cooldown when called with temperature!=1 - PR #15330
Snowflake Cortex
- Add function calling support for Snowflake Cortex REST API - PR #15221
Gemini
- Fix header forwarding for Gemini/Vertex AI providers in proxy mode - PR #15231
Azure
- Removed stop param from unsupported azure models - PR #15229
- Fix(azure/responses): remove invalid status param from azure call - PR #15253
- Add new Azure AI models with pricing details - PR #15387
- AzureAD Default credentials - select credential type based on environment - PR #14470
Bedrock
- Add Global Cross-Region Inference - PR #15210
- Add Cohere Embed v4 support for AWS Bedrock - PR #15298
- Fix(bedrock): include cacheWriteInputTokens in prompt_tokens calculation - PR #15292
- Add Bedrock AU Cross-Region Inference for Claude Sonnet 4.5 - PR #15402
- Converse → /v1/messages streaming doesn't handle parallel tool calls with Claude models - PR #15315
Vertex AI
- Implement Context Caching for Vertex AI provider - PR #15226
- Support for Vertex AI Gemma Models on Custom Endpoints - PR #15397
- VertexAI - gemma model family support (custom endpoints) - PR #15419
- VertexAI Gemma model family streaming support + Added MedGemma - PR #15427
OCI
- Add OCI Cohere support with tool calling and streaming capabilities - PR #15365
Watson X
- Add Watson X foundation model definitions to model_prices_and_context_window.json - PR #15219
- Watsonx - Apply correct prompt templates for openai/gpt-oss model family - PR #15341
OpenRouter
- Fix - (openrouter): move cache_control to content blocks for claude/gemini - PR #15345
- Fix - OpenRouter cache_control to only apply to last content block - PR #15395
Together AI
- Add new together models - PR #15383

Bug Fixes

General
- Bug fix: gpt-5-chat-latest has incorrect max_input_tokens value - PR #15116
- Fix reasoning response ID - PR #15265
- Fix issue with parsing assistant messages - PR #15320
- Fix litellm_param based costing - PR #15336
- Fix lint errors - PR #15406

LLM API Endpoints

Features

Responses API
- Added streaming support for response api streaming image generation - PR #15269
- Add native Responses API support for litellm_proxy provider - PR #15347
- Temporarily relax ResponsesAPIResponse parsing to support custom backends (e.g., vLLM) - PR #15362
Files API
- Feat(files): add @client decorator to file operations - PR #15339
/generateContent
- Fix gemini cli by actually streaming the response - PR #15264
Azure Passthrough
- Azure - passthrough support with router models - PR #15240

Bugs

General
- Fix x-litellm-cache-key header not being returned on cache hit - PR #15348

Management Endpoints / UI

Features

Proxy CLI Auth
- Proxy CLI - dont store existing key in the URL, store it in the state param - PR #15290
Models + Endpoints
- Make PATCH /model/{model_id}/update handle team_id consistently with POST /model/new - PR #15297
- Feature: adds Infinity as a provider in the UI - PR #15285
- Fix: model + endpoints page crash when config file contains router_settings.model_group_alias - PR #15308
- Models & Endpoints Initial Refactor - PR #15435
- Litellm UI API Reference page updates - PR #15438
Teams
- Teams page: new column "Your Role" on the teams table - PR #15384
- LiteLLM Dashboard Teams UI refactor - PR #15418
UI Infrastructure
- Added prettier to autoformat frontend - PR #15215
- Adds turbopack to the npm run dev command in UI to build faster during development - PR #15250
- (perf) fix: Replaces bloated key list calls with lean key aliases endpoint - PR #15252
- Potentially fixes a UI spasm issue with an expired cookie - PR #15309
- LiteLLM UI Refactor Infrastructure - PR #15236
- Enforces removal of unused imports from UI - PR #15416
- Fix: usage page >> Model Activity >> spend per day graph: y-axis clipping on large spend values - PR #15389
- Updates guardrail provider logos - PR #15421
Admin Settings
- Fix: Router settings do not update despite success message - PR #15249
- Fix: Prevents DB from accidentally overriding config file values if they are empty in DB - PR #15340
SSO
- SSO - support EntraID app roles - PR #15351

Logging / Guardrail / Prompt Management Integrations

Features

PostHog
- Feat: posthog per request api key - PR #15379

Guardrails

EnkryptAI
- Add EnkryptAI Guardrails on LiteLLM - PR #15390

Spend Tracking, Budgets and Rate Limiting

Tag Management
- Tag Management - Add support for setting tag based budgets - PR #15433
Dynamic Rate Limiter v3
- QA/Fixes - Dynamic Rate Limiter v3 - final QA - PR #15311
- Fix dynamic Rate limiter v3 - inserting litellm_model_saturation - PR #15394
Shared Health Check
- Implement Shared Health Check State Across Pods - PR #15380

MCP Gateway

Tool Control
- MCP Gateway - UI - Select allowed tools for Key, Teams - PR #15241
- MCP Gateway - Backend - Allow storing allowed tools by team/key - PR #15243
- MCP Gateway - Fine-grained Database Object Storage Control - PR #15255
- MCP Gateway - Litellm mcp fixes team control - PR #15304
- MCP Gateway - QA/Fixes - Ensure Team/Key level enforcement works for MCPs - PR #15305
- Feature: Include server_name in /v1/mcp/server/health endpoint response - PR #15431
OpenAPI Integration
- MCP - support converting OpenAPI specs to MCP servers - PR #15343
- MCP - specify allowed params per tool - PR #15346
Configuration
- MCP - support setting CA_BUNDLE_PATH - PR #15253
- Fix: Ensure MCP client stays open during tool call - PR #15391
- Remove hardcoded "public" schema in migration.sql - PR #15363

Performance / Loadbalancing / Reliability improvements

Router Optimizations
- Fix - Router: add model_name index for O(1) deployment lookups - PR #15113
- Refactor Utils: extract inner function from client - PR #15234
- Fix Networking: remove limitations - PR #15302
Session Management
- Fix - Sessions not being shared - PR #15388
- Fix: remove panic from hot path - PR #15396
- Fix - shared session parsing and usage issue - PR #15440
- Fix: handle closed aiohttp sessions - PR #15442
- Fix: prevent session leaks when recreating aiohttp sessions - PR #15443
SSL/TLS Performance
- Perf: optimize SSL/TLS handshake performance with prioritized cipher - PR #15398
Dependencies
- Upgrades tenacity version to 8.5.0 - PR #15303
Data Masking
- Fix - SensitiveDataMasker converts lists to string - PR #15420

General AI Gateway Improvements

Security

General
- Fix: redact AWS credentials when redact_user_api_key_info enabled - PR #15321

Documentation Updates

Provider Documentation
- Update doc: perf update - PR #15211
- Add W&B Inference documentation - PR #15278
Deployment
- Deletion of docker-compose buggy comment that cause config.yaml based startup fail - PR #15425

New Contributors

@Gal-bloch made their first contribution in PR #15219
@lcfyi made their first contribution in PR #15315
@ashengstd made their first contribution in PR #15362
@vkolehmainen made their first contribution in PR #15363
@jlan-nl made their first contribution in PR #15330
@BCook98 made their first contribution in PR #15402
@PabloGmz96 made their first contribution in PR #15425

Full Changelog

v1.77.7-stable - 2.9x Lower Median Latency

October 4, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaff

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.77.7.rc.1

pip install litellm
pip install litellm==1.77.7.rc.1

Key Highlights

Dynamic Rate Limiter v3 - Automatically maximizes throughput when capacity is available (< 80% saturation) by allowing lower-priority requests to use unused capacity, then switches to fair priority-based allocation under high load (≥ 80%) to prevent blocking
Major Performance Improvements - 2.9x lower median latency at 1,000 concurrent users.
Claude Sonnet 4.5 - Support for Anthropic's new Claude Sonnet 4.5 model family with 200K+ context and tiered pricing
MCP Gateway Enhancements - Fine-grained tool control, server permissions, and forwardable headers
AMD Lemonade & Nvidia NIM - New provider support for AMD Lemonade and Nvidia NIM Rerank
GitLab Prompt Management - GitLab-based prompt management integration

Performance - 2.9x Lower Median Latency

This update removes LiteLLM router inefficiencies, reducing complexity from O(M×N) to O(1). Previously, it built a new array and ran repeated checks like data["model"] in llm_router.get_model_ids(). Now, a direct ID-to-deployment map eliminates redundant allocations and scans.

As a result, performance improved across all latency percentiles:

Median latency: 320 ms → 110 ms (−65.6%)
p95 latency: 850 ms → 440 ms (−48.2%)
p99 latency: 1,400 ms → 810 ms (−42.1%)
Average latency: 864 ms → 310 ms (−64%)

Test Setup

Locust

Concurrent users: 1,000
Ramp-up: 500

System Specs

CPU: 4 vCPUs
Memory: 8 GB RAM
LiteLLM Workers: 4
Instances: 4

Configuration (config.yaml)

View the complete configuration: gist.github.com/AlexsanderHamir/config.yaml

Load Script (no_cache_hits.py)

View the complete load testing script: gist.github.com/AlexsanderHamir/no_cache_hits.py

MCP OAuth 2.0 Support

This release adds support for OAuth 2.0 Client Credentials for MCP servers. This is great for Internal Dev Tools use-cases, as it enables your users to call MCP servers, with their own credentials. E.g. Allowing your developers to call the Github MCP, with their own credentials.

Set it up today on Claude Code

Scheduled Key Rotations

This release brings support for scheduling virtual key rotations on LiteLLM AI Gateway.

From this release you can enforce Virtual Keys to rotate on a schedule of your choice e.g every 15 days/30 days/60 days etc.

This is great for Proxy Admins who need to enforce security policies for production workloads.

Get Started

New Models / Updated Models

New Model Support

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)	Features
Anthropic	`claude-sonnet-4-5`	200K	$3.00	$15.00	Chat, reasoning, vision, function calling, prompt caching
Anthropic	`claude-sonnet-4-5-20250929`	200K	$3.00	$15.00	Chat, reasoning, vision, function calling, prompt caching
Bedrock	`eu.anthropic.claude-sonnet-4-5-20250929-v1:0`	200K	$3.00	$15.00	Chat, reasoning, vision, function calling, prompt caching
Azure AI	`azure_ai/grok-4`	131K	$5.50	$27.50	Chat, reasoning, function calling, web search
Azure AI	`azure_ai/grok-4-fast-reasoning`	131K	$0.43	$1.73	Chat, reasoning, function calling, web search
Azure AI	`azure_ai/grok-4-fast-non-reasoning`	131K	$0.43	$1.73	Chat, function calling, web search
Azure AI	`azure_ai/grok-code-fast-1`	131K	$3.50	$17.50	Chat, function calling, web search
Groq	`groq/moonshotai/kimi-k2-instruct-0905`	Context varies	Pricing varies	Pricing varies	Chat, function calling
Ollama	Ollama Cloud models	Varies	Free	Free	Self-hosted models via Ollama Cloud

Features

Anthropic
- Add new claude-sonnet-4-5 model family with tiered pricing above 200K tokens - PR #15041
- Add anthropic/claude-sonnet-4-5 to model price json with prompt caching support - PR #15049
- Add 200K prices for Sonnet 4.5 - PR #15140
- Add cost tracking for /v1/messages in streaming response - PR #15102
- Add /v1/messages/count_tokens to Anthropic routes for non-admin user access - PR #15034
Gemini
- Ignore type param for gemini tools - PR #15022
Vertex AI
- Add LiteLLM Overhead metric for VertexAI - PR #15040
- Support googlemap grounding in vertex ai - PR #15179
Azure
- Add azure_ai grok-4 model family - PR #15137
- Use the extra_query parameter for GET requests in Azure Batch - PR #14997
- Use extra_query for download results (Batch API) - PR #15025
- Add support for Azure AD token-based authorization - PR #14813
Ollama
- Add ollama cloud models - PR #15008
Groq
- Add groq/moonshotai/kimi-k2-instruct-0905 - PR #15079
OpenAI
- Add support for GPT 5 codex models - PR #14841
DeepInfra
- Update DeepInfra model data refresh with latest pricing - PR #14939
Bedrock
- Add JP Cross-Region Inference - PR #15188
- Add "eu.anthropic.claude-sonnet-4-5-20250929-v1:0" - PR #15181
- Add twelvelabs bedrock Async Invoke Support - PR #14871
Nvidia NIM
- Add Nvidia NIM Rerank Support - PR #15152

Bug Fixes

VLLM
- Fix response_format bug in hosted vllm audio_transcription - PR #15010
- Fix passthrough of atranscription into kwargs going to upstream provider - PR #15005
OCI
- Fix OCI Generative AI Integration when using Proxy - PR #15072
General
- Fix: Authorization header to use correct "Bearer" capitalization - PR #14764
- Bug fix: gpt-5-chat-latest has incorrect max_input_tokens value - PR #15116
- Update request handling for original exceptions - PR #15013

New Provider Support

AMD Lemonade
- Add AMD Lemonade provider support - PR #14840

LLM API Endpoints

Features

Responses API
- Return Cost for Responses API Streaming requests - PR #15053
/generateContent
- Add full support for native Gemini API translation - PR #15029
Passthrough Gemini Routes
- Add Gemini generateContent passthrough cost tracking - PR #15014
- Add streamGenerateContent cost tracking in passthrough - PR #15199
Passthrough Vertex AI Routes
- Add cost tracking for Vertex AI Passthrough /predict endpoint - PR #15019
- Add cost tracking for Vertex AI Live API WebSocket Passthrough - PR #14956
General
- Preserve Whitespace Characters in Model Response Streams - PR #15160
- Add provider name to payload specification - PR #15130
- Ensure query params are forwarded from origin url to downstream request - PR #15087

Management Endpoints / UI

Features

Virtual Keys
- Ensure LLM_API_KEYs can access pass through routes - PR #15115
- Support 'guaranteed_throughput' when setting limits on keys belonging to a team - PR #15120
Models + Endpoints
- Ensure OCI secret fields not shared on /models and /v1/models endpoints - PR #15085
- Add snowflake on UI - PR #15083
- Make UI theme settings publicly accessible for custom branding - PR #15074
Admin Settings
- Ensure OTEL settings are saved in DB after set on UI - PR #15118
- Top api key tags - PR #15151, PR #15156
MCP
- show health status of MCP servers - PR #15185
- allow setting extra headers on the UI - PR #15185
- allow editing allowed tools on the UI - PR #15185

Bug Fixes

Virtual Keys
- (security) prevent user key from updating other user keys - PR #15201
- (security) don't return all keys with blank key alias on /v2/key/info - PR #15201
- Fix Session Token Cookie Infinite Logout Loop - PR #15146
Models + Endpoints
- Make UI theme settings publicly accessible for custom branding - PR #15074
Teams
- fix failed copy to clipboard for http ui - PR #15195
Logs
- fix logs page render logs on filter lookup - PR #15195
- fix lookup list of end users (migrate to more efficient /customers/list lookup) - PR #15195
Test key
- update selected model on key change - PR #15197
Dashboard
- Fix LiteLLM model name fallback in dashboard overview - PR #14998

Logging / Guardrail / Prompt Management Integrations

Features

OpenTelemetry
- Use generation_name for span naming in logging method - PR #14799
Langfuse
- Handle non-serializable objects in Langfuse logging - PR #15148
- Set usage_details.total in langfuse integration - PR #15015
Prometheus
- support custom metadata labels on key/team - PR #15094

Guardrails

Javelin
- Add Javelin standalone guardrails integration for LiteLLM Proxy - PR #14983
- Add logging for important status fields in guardrails - PR #15090
- Don't run post_call guardrail if no text returned from Bedrock - PR #15106

Prompt Management

GitLab
- GitLab based Prompt manager - PR #14988

Spend Tracking, Budgets and Rate Limiting

Cost Tracking
- Proxy: end user cost tracking in the responses API - PR #15124
Parallel Request Limiter v3
- Use well known redis cluster hashing algorithm - PR #15052
- Fixes to dynamic rate limiter v3 - add saturation detection - PR #15119
- Dynamic Rate Limiter v3 - fixes for detecting saturation + fixes for post saturation behavior - PR #15192
Teams
- Add model specific tpm/rpm limits to teams on LiteLLM - PR #15044

MCP Gateway

Server Configuration
- Specify forwardable headers, specify allowed/disallowed tools for MCP servers - PR #15002
- Enforce server permissions on call tools - PR #15044
- MCP Gateway Fine-grained Tools Addition - PR #15153
Bug Fixes
- Remove servername prefix mcp tools tests - PR #14986
- Resolve regression with duplicate Mcp-Protocol-Version header - PR #15050
- Fix test_mcp_server.py - PR #15183

Performance / Loadbalancing / Reliability improvements

Router Optimizations
- +62.5% P99 Latency Improvement - Remove router inefficiencies (from O(M*N) to O(1)) - PR #15046
- Remove hasattr checks in Router - PR #15082
- Remove Double Lookups - PR #15084
- Optimize _filter_cooldown_deployments from O(n×m + k×n) to O(n) - PR #15091
- Optimize unhealthy deployment filtering in retry path (O(n*m) → O(n+m)) - PR #15110
Cache Optimizations
- Reduce complexity of InMemoryCache.evict_cache from O(n*log(n)) to O(log(n)) - PR #15000
- Avoiding expensive operations when cache isn't available - PR #15182
Worker Management
- Add proxy CLI option to recycle workers after N requests - PR #15007
Metrics & Monitoring
- LiteLLM Overhead metric tracking - Add support for tracking litellm overhead on cache hits - PR #15045

Documentation Updates

Provider Documentation
- Update litellm docs from latest release - PR #15004
- Add missing api_key parameter - PR #15058
General Documentation
- Use docker compose instead of docker-compose - PR #15024
- Add railtracks to projects that are using litellm - PR #15144
- Perf: Last week improvement - PR #15193
- Sync models GitHub documentation with Loom video and cross-reference - PR #15191

Security Fixes

JWT Token Security - Don't log JWT SSO token on .info() log - PR #15145

New Contributors

@herve-ves made their first contribution in PR #14998
@wenxi-onyx made their first contribution in PR #15008
@jpetrucciani made their first contribution in PR #15005
@abhijitjavelin made their first contribution in PR #14983
@ZeroClover made their first contribution in PR #15039
@cedarm made their first contribution in PR #15043
@Isydmr made their first contribution in PR #15025
@serializer made their first contribution in PR #15013
@eddierichter-amd made their first contribution in PR #14840
@malags made their first contribution in PR #15000
@henryhwang made their first contribution in PR #15029
@plafleur made their first contribution in PR #15111
@tyler-liner made their first contribution in PR #14799
@Amir-R25 made their first contribution in PR #15144
@georg-wolflein made their first contribution in PR #15124
@niharm made their first contribution in PR #15140
@anthony-liner made their first contribution in PR #15015
@rishiganesh2002 made their first contribution in PR #15153
@danielaskdd made their first contribution in PR #15160
@JVenberg made their first contribution in PR #15146
@speglich made their first contribution in PR #15072
@daily-kim made their first contribution in PR #14764

Full Changelog

v1.77.5-stable - MCP OAuth 2.0 Support

September 29, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaff

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.77.5-stable

pip install litellm
pip install litellm==1.77.5

Key Highlights

MCP OAuth 2.0 Support - Enhanced authentication for Model Context Protocol integrations
Scheduled Key Rotations - Automated key rotation capabilities for enhanced security
New Gemini 2.5 Flash & Flash-lite Models - Latest September 2025 preview models with improved pricing and features
Performance Improvements - 54% RPS improvement

Performance Improvements - 54% RPS Improvement

This release brings a 54% RPS improvement (1,040 → 1,602 RPS, aggregated) per instance.

The improvement comes from fixing O(n²) inefficiencies in the LiteLLM Router, primarily caused by repeated use of in statements inside loops over large arrays.

Tests were run with a database-only setup (no cache hits).

Test Setup

All benchmarks were executed using Locust with 1,000 concurrent users and a ramp-up of 500. The environment was configured to stress the routing layer and eliminate caching as a variable.

System Specs

CPU: 8 vCPUs
Memory: 32 GB RAM

Configuration (config.yaml)

View the complete configuration: gist.github.com/AlexsanderHamir/config.yaml

Load Script (no_cache_hits.py)

View the complete load testing script: gist.github.com/AlexsanderHamir/no_cache_hits.py

New Models / Updated Models

New Model Support

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)	Features
Gemini	`gemini-2.5-flash-preview-09-2025`	1M	$0.30	$2.50	Chat, reasoning, vision, audio
Gemini	`gemini-2.5-flash-lite-preview-09-2025`	1M	$0.10	$0.40	Chat, reasoning, vision, audio
Gemini	`gemini-flash-latest`	1M	$0.30	$2.50	Chat, reasoning, vision, audio
Gemini	`gemini-flash-lite-latest`	1M	$0.10	$0.40	Chat, reasoning, vision, audio
DeepSeek	`deepseek-chat`	131K	$0.60	$1.70	Chat, function calling, caching
DeepSeek	`deepseek-reasoner`	131K	$0.60	$1.70	Chat, reasoning
Bedrock	`deepseek.v3-v1:0`	164K	$0.58	$1.68	Chat, reasoning, function calling
Azure	`azure/gpt-5-codex`	272K	$1.25	$10.00	Responses API, reasoning, vision
OpenAI	`gpt-5-codex`	272K	$1.25	$10.00	Responses API, reasoning, vision
SambaNova	`sambanova/DeepSeek-V3.1`	33K	$3.00	$4.50	Chat, reasoning, function calling
SambaNova	`sambanova/gpt-oss-120b`	131K	$3.00	$4.50	Chat, reasoning, function calling
Bedrock	`qwen.qwen3-coder-480b-a35b-v1:0`	262K	$0.22	$1.80	Chat, reasoning, function calling
Bedrock	`qwen.qwen3-235b-a22b-2507-v1:0`	262K	$0.22	$0.88	Chat, reasoning, function calling
Bedrock	`qwen.qwen3-coder-30b-a3b-v1:0`	262K	$0.15	$0.60	Chat, reasoning, function calling
Bedrock	`qwen.qwen3-32b-v1:0`	131K	$0.15	$0.60	Chat, reasoning, function calling
Vertex AI	`vertex_ai/qwen/qwen3-next-80b-a3b-instruct-maas`	262K	$0.15	$1.20	Chat, function calling
Vertex AI	`vertex_ai/qwen/qwen3-next-80b-a3b-thinking-maas`	262K	$0.15	$1.20	Chat, function calling
Vertex AI	`vertex_ai/deepseek-ai/deepseek-v3.1-maas`	164K	$1.35	$5.40	Chat, reasoning, function calling
OpenRouter	`openrouter/x-ai/grok-4-fast:free`	2M	$0.00	$0.00	Chat, reasoning, function calling
XAI	`xai/grok-4-fast-reasoning`	2M	$0.20	$0.50	Chat, reasoning, function calling
XAI	`xai/grok-4-fast-non-reasoning`	2M	$0.20	$0.50	Chat, function calling

Features

Gemini
- Added Gemini 2.5 Flash and Flash-lite preview models (September 2025 release) with improved pricing - PR #14948
- Added new Anthropic web fetch tool support - PR #14951
XAI
- Add xai/grok-4-fast models - PR #14833
Anthropic
- Updated Claude Sonnet 4 configs to reflect million-token context window pricing - PR #14639
- Added supported text field to anthropic citation response - PR #14164
Bedrock
- Added support for Qwen models family & Deepseek 3.1 to Amazon Bedrock - PR #14845
- Support requestMetadata in Bedrock Converse API - PR #14570
Vertex AI
- Added vertex_ai/qwen models and azure/gpt-5-codex - PR #14844
- Update vertex ai qwen model pricing - PR #14828
- Vertex AI Context Caching: use Vertex ai API v1 instead of v1beta1 and accept 'cachedContent' param - PR #14831
SambaNova
- Add sambanova deepseek v3.1 and gpt-oss-120b - PR #14866
OpenAI
- Fix inconsistent token configs for gpt-5 models - PR #14942
- GPT-3.5-Turbo price updated - PR #14858
OpenRouter
- Add gpt-5 and gpt-5-codex to OpenRouter cost map - PR #14879
VLLM
- Fix vllm passthrough - PR #14778
Flux
- Support flux image edit - PR #14790

Bug Fixes

Anthropic
- Fix: Support claude code auth via subscription (anthropic) - PR #14821
- Fix Anthropic streaming IDs - PR #14965
- Revert incorrect changes to sonnet-4 max output tokens - PR #14933
OpenAI
- Fix a bug where openai image edit silently ignores multiple images - PR #14893
VLLM
- Fix: vLLM provider's rerank endpoint from /v1/rerank to /rerank - PR #14938

New Provider Support

W&B Inference
- Add W&B Inference to LiteLLM - PR #14416

LLM API Endpoints

Features

General
- Add SDK support for additional headers - PR #14761
- Add shared_session parameter for aiohttp ClientSession reuse - PR #14721

Bugs

General
- Fix: Streaming tool call index assignment for multiple tool calls - PR #14587
- Fix load credentials in token counter proxy - PR #14808

Management Endpoints / UI

Features

Proxy CLI Auth
- Allow re-using cli auth token - PR #14780
- Create a python method to login using litellm proxy - PR #14782
- Fixes for LiteLLM Proxy CLI to Auth to Gateway - PR #14836

Virtual Keys

Initial support for scheduled key rotations - PR #14877
Allow scheduling key rotations when creating virtual keys - PR #14960

Models + Endpoints

Fix: added Oracle to provider's list - PR #14835

Bugs

SSO - Fix: SSO "Clear" button writes empty values instead of removing SSO config - PR #14826
Admin Settings - Remove useful links from admin settings - PR #14918
Management Routes - Add /user/list to management routes - PR #14868

Logging / Guardrail / Prompt Management Integrations

Features

DataDog
- Logging - datadog callback Log message content w/o sending to datadog - PR #14909
Langfuse
- Adding langfuse usage details for cached tokens - PR #10955
Opik
- Improve opik integration code - PR #14888
SQS
- Error logging support for SQS Logger - PR #14974

Guardrails

LakeraAI v2 Guardrail - Ensure exception is raised correctly - PR #14867
Presidio Guardrail - Support custom entity types in Presidio guardrail with Union[PiiEntityType, str] - PR #14899
Noma Guardrail - Add noma guardrail provider to ui - PR #14415

Prompt Management

BitBucket Integration - Add BitBucket Integration for Prompt Management - PR #14882

Spend Tracking, Budgets and Rate Limiting

Service Tier Pricing - Add service_tier based pricing support for openai (BOTH Service & Priority Support) - PR #14796
Cost Tracking - Show input, output, tool call cost breakdown in StandardLoggingPayload - PR #14921
Parallel Request Limiter v3
- Ensure Lua scripts can execute on redis cluster - PR #14968
- Fix: get metadata info from both metadata and litellm_metadata fields - PR #14783
Priority Reservation - Fix: Priority Reservation: keys without priority metadata receive higher priority than keys with explicit priority configurations - PR #14832

MCP Gateway

MCP Configuration - Enable custom fields in mcp_info configuration - PR #14794
MCP Tools - Remove server_name prefix from list_tools - PR #14720
OAuth Flow - Initial commit for v2 oauth flow - PR #14964

Performance / Loadbalancing / Reliability improvements

Memory Leak Fix - Fix InMemoryCache unbounded growth when TTLs are set - PR #14869
Cache Performance - Fix: cache root cause - PR #14827
Concurrency Fix - Fix concurrency/scaling when many Python threads do streaming using sync completions - PR #14816
Performance Optimization - Fix: reduce get_deployment cost to O(1) - PR #14967
Performance Optimization - Fix: remove slow string operation - PR #14955
DB Connection Management - Fix: DB connection state retries - PR #14925

Documentation Updates

Provider Documentation - Fix docs for provider_specific_params.md - PR #14787
Model References - Update model references from gemini-pro to gemini-2.5-pro - PR #14775
Letta Guide - Add Letta Guide documentation - PR #14798
README - Make the README document clearer - PR #14860
Session Management - Update docs for session management availability - PR #14914
Cost Documentation - Add documentation for additional cost-related keys in custom pricing - PR #14949
Azure Passthrough - Add azure passthrough documentation - PR #14958
General Documentation - Doc updates sept 2025 - PR #14769
- Clarified bridging between endpoints and mode in docs.
- Added Vertex AI Gemini API configuration as an alternative in relevant guides. Linked AWS authentication info in the Bedrock guardrails documentation.
- Added Cancel Response API usage with code snippets
- Clarified that SSO (Single Sign-On) is free for up to 5 users:
- Alphabetized sidebar, leaving quick start / intros at top of categories
- Documented max_connections under cache_params.
- Clarified IAM AssumeRole Policy requirements.
- Added transform utilities example to Getting Started (showing request transformation).
- Added references to models.litellm.ai as the full models list in various docs.
- Added a code snippet for async_post_call_success_hook.
- Removed broken links to callbacks management guide. - Reformatted and linked cookbooks + other relevant docs
Documentation Corrections - Corrected docs updates sept 2025 - PR #14916

New Contributors

@uzaxirr made their first contribution in PR #14761
@xprilion made their first contribution in PR #14416
@CH-GAGANRAJ made their first contribution in PR #14779
@otaviofbrito made their first contribution in PR #14778
@danielmklein made their first contribution in PR #14639
@Jetemple made their first contribution in PR #14826
@akshoop made their first contribution in PR #14818
@hazyone made their first contribution in PR #14821
@leventov made their first contribution in PR #14816
@fabriciojoc made their first contribution in PR #10955
@onlylonly made their first contribution in PR #14845
@Copilot made their first contribution in PR #14869
@arsh72 made their first contribution in PR #14899
@berri-teddy made their first contribution in PR #14914
@vpbill made their first contribution in PR #14415
@kgritesh made their first contribution in PR #14893
@oytunkutrup1 made their first contribution in PR #14858
@nherment made their first contribution in PR #14933
@deepanshululla made their first contribution in PR #14974
@TeddyAmkie made their first contribution in PR #14758
@SmartManoj made their first contribution in PR #14775
@uc4w6c made their first contribution in PR #14720
@luizrennocosta made their first contribution in PR #14783
@AlexsanderHamir made their first contribution in PR #14827
@dharamendrak made their first contribution in PR #14721
@TomeHirata made their first contribution in PR #14164
@mrFranklin made their first contribution in PR #14860
@luisfucros made their first contribution in PR #14866
@huangyafei made their first contribution in PR #14879
@thiswillbeyourgithub made their first contribution in PR #14949
@Maximgitman made their first contribution in PR #14965
@subnet-dev made their first contribution in PR #14938
@22mSqRi made their first contribution in PR #14972

Full Changelog

v1.77.3-stable - Priority Based Rate Limiting

September 21, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaff

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.77.3-stable

pip install litellm
pip install litellm==1.77.3

Key Highlights

+550 RPS Performance Improvements - Optimizations in request handling and object initialization.
Priority Quota Reservation - Proxy admins can now reserve TPM/RPM capacity for specific keys.

Priority Quota Reservation

This release adds support for priority quota reservation. This allows Proxy Admins to reserve specific percentages of model capacity for different use cases.

This is great for use cases where you want to ensure your realtime use cases must always get priority responses and background development jobs can take longer.

This release adds support for priority quota reservation. This allows Proxy Admins to reserve TPM/RPM capacity for keys based on metadata priority levels, ensuring critical production workloads get guaranteed access regardless of development traffic volume.

Get started here

+550 RPS Performance Improvements

This release delivers significant RPS improvements through targeted optimizations.

We've achieved a +500 RPS boost by fixing cache type inconsistencies that were causing frequent cache misses, plus an additional +50 RPS by removing unnecessary coroutine checks from the hot path.

New Models / Updated Models

New Model Support

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)	Features
SambaNova	`sambanova/deepseek-v3.1`	128K	$0.90	$0.90	Chat completions
SambaNova	`sambanova/gpt-oss-120b`	128K	$0.72	$0.72	Chat completions
OVHCloud	Various models	Varies	Contact provider	Contact provider	Chat completions
CompactifAI	Various models	Varies	Contact provider	Contact provider	Chat completions
TwelveLabs	`twelvelabs/marengo-embed-2.7`	32K	$0.12	$0.00	Embeddings

Features

OVHCloud AI Endpoints
- New provider support with comprehensive model catalog - PR #14494
CompactifAI
- New provider integration - PR #14532
SambaNova
- Added DeepSeek v3.1 and GPT-OSS-120B models - PR #14500
Bedrock
- Cross-region inference profile cost calculation - PR #14566
- AWS external ID parameter support for authentication - PR #14582
- CountTokens API implementation - PR #14557
- Titan V2 encoding_format parameter support - PR #14687
- Nova Canvas image generation inference profiles - PR #14578
- Bedrock Batches API - batch processing support with file upload and request transformation - PR #14618
- Bedrock Twelve Labs embedding provider support - PR #14697
Vertex AI
- Gemini labels field provider-aware filtering - PR #14563
- Gemini Batch API support - PR #14733
Volcengine
- Fixed thinking parameters when disabled - PR #14569
Cohere
- Handle Generate API deprecation, default to chat endpoints - PR #14676
TwelveLabs
- Added Marengo Embed 2.7 embedding support - PR #14674

Bug Fixes

Bedrock
- Empty arguments handling in tool call invocation - PR #14583
Vertex AI
- Avoid deepcopy crash with non-pickleables in Gemini/Vertex - PR #14418
XAI
- Fix unsupported stop parameter for grok-code models - PR #14565
Gemini
- Updated error message for Gemini API - PR #14589
- Fixed 2.5 Flash Image Preview model routing - PR #14715
- API key passing for token counting endpoints - PR #14744

New Provider Support

OVHCloud AI Endpoints
- Complete provider integration with model catalog and authentication - PR #14494
CompactifAI
- New provider support with documentation - PR #14532

LLM API Endpoints

Features

/responses
- Added cancel endpoint support for non-admin users - PR #14594
- Improved response session handling and cold storage configuration with s3 - PR #14534
- Added OpenAI & Azure /responses/cancel endpoint support - PR #14561
General
- Enhanced rate limit error messages with details - PR #14736
- Middle-truncation for spend log payloads - PR #14637

Bugs

/chat/completions
- Fixed completion chat ID handling - PR #14548
- Prevent AttributeError for _get_tags_from_request_kwargs - PR #14735
/responses
- Fixed cost calculation - PR #14675
General
- Rate limiter AttributeError fix - PR #14609

Spend Tracking, Budgets and Rate Limiting

Responses API Cost Calculation fix - PR #14675
Anthropic Cache Token Pricing - Separate 1-hour vs 5-minute cache creation costs - PR #14620, PR #14652
Indochina Time Timezone support for budget resets - PR #14666
Soft Budget Alert Cache Issues - Resolved soft budget alert cache issues - PR #14491
Dynamic Rate Limiter v3 - Priority routing improvements - PR #14734
Enhanced Rate Limit Errors - More detailed error messages - PR #14736

Management Endpoints / UI

Features

Team Member Service Account Keys - Allow team members to view keys they create - PR #14619
Default Budget for JWT Teams - Auto-assign budgets to generated teams - PR #14514
SSO Access Control Groups - Enhanced token info endpoint integration - PR #14738
Health Test Connect Protection - Restrict access based on model creation permissions - PR #14650
Amazon Bedrock Guardrail Info View - Enhanced logging visualization - PR #14696

Bug Fixes

SCIM v2 - Fix group PUSH and PUT operations for non-existent members - PR #14581
Guardrail View/Edit/Delete behavior fixes - PR #14622
In-Memory Guardrail update failures - PR #14653

Logging / Guardrail Integrations

Features

DataDog
- Enhanced spend tracking metrics - PR #14555
- Stream support with is_streamed_request parameter - PR #14673
- Fixed tool calls metadata passing - PR #14531
Langfuse
- Added logging support for Responses API - PR #14597
Langsmith
- Langsmith Sampling Rate - Key/Team-level tracing configuration - PR #14740
Prometheus
- Multi-worker support improvements - PR #14530
- User email labels in monitoring - PR #14520
Opik
- Fixed timezone issue - PR #14708

Bug Fixes

S3
- Fixed 404 error when using s3_endpoint_url - PR #14559

Guardrails

Tool Permission Guardrail - Fine-grained tool access control - PR #14519
Bedrock Guardrails - Selective guarding support with runtime endpoint configuration - PR #14575, PR #14650
Default Last Message in guardrails - PR #14640
AWS exceptions handling despite 200 response - PR #14658

New Integration

PostHog - Complete observability integration for LiteLLM usage tracking and analytics - PR #14610

MCP Gateway

MCP Server Alias Parsing - Multi-part URL path support - PR #14558
MCP Filter Recomputation - After server deletion - PR #14542
MCP Gateway Tools List improvements - PR #14695

Performance / Loadbalancing / Reliability improvements

+500 RPS Performance Boost when sending the user field - PR #14616
+50 RPS by removing iscoroutine from hot path - PR #14649
7% reduction in init overhead - PR #14689
Generic Object Pool implementation for better resource management - PR #14702

General Proxy Improvements

Middle-Truncation for spend log payloads - PR #14637

Security

Security Update - Bump aiohttp==3.12.14, fix CVE-2025-53643 - PR #14638

New Contributors

@luisfucros made their first contribution in PR #14500
@hanakannzashi made their first contribution in PR #14548
@eliasto made their first contribution in PR #14494
@Rasmusafj made their first contribution in PR #14491
@LingXuanYin made their first contribution in PR #14569
@ronaldpereira made their first contribution in PR #14613
@hula-la made their first contribution in PR #14534
@carlos-marchal-ph made their first contribution in PR #14610
@akraines made their first contribution in PR #14637
@mrFranklin made their first contribution in PR #14708
@tcx4c70 made their first contribution in PR #14675
@michaeltansg made their first contribution in PR #14666
@tosi29 made their first contribution in PR #14725
@gmdfalk made their first contribution in PR #14735
@FelipeRodriguesGare made their first contribution in PR #14733
@mritunjaysharma394 made their first contribution in PR #14678

Full Changelog

v1.77.2-stable - Bedrock Batches API

September 13, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.77.2-stable

pip install litellm
pip install litellm==1.77.2.post1

Key Highlights

Bedrock Batches API - Support for creating Batch Inference Jobs on Bedrock using LiteLLM's unified batch API (OpenAI compatible)
Qwen API Tiered Pricing - Cost tracking support for Dashscope (Qwen) models with multiple pricing tiers

New Models / Updated Models

New Model Support

Provider	Model	Context Window	Pricing ($/1M tokens)	Features
DeepInfra	`deepinfra/deepseek-ai/DeepSeek-R1`	164K	Input: $0.70 Output: $2.40	Chat completions, tool calling
Heroku	`heroku/claude-4-sonnet`	8K	Contact provider for pricing	Function calling, tool choice
Heroku	`heroku/claude-3-7-sonnet`	8K	Contact provider for pricing	Function calling, tool choice
Heroku	`heroku/claude-3-5-sonnet-latest`	8K	Contact provider for pricing	Function calling, tool choice
Heroku	`heroku/claude-3-5-haiku`	4K	Contact provider for pricing	Function calling, tool choice
Dashscope	`dashscope/qwen-plus-latest`	1M	Tiered Pricing: • 0-256K tokens: $0.40 / $1.20 • 256K-1M tokens: $1.20 / $3.60	Function calling, reasoning
Dashscope	`dashscope/qwen3-max-preview`	262K	Tiered Pricing: • 0-32K tokens: $1.20 / $6.00 • 32K-128K tokens: $2.40 / $12.00 • 128K-252K tokens: $3.00 / $15.00	Function calling, reasoning
Dashscope	`dashscope/qwen-flash`	1M	Tiered Pricing: • 0-256K tokens: $0.05 / $0.40 • 256K-1M tokens: $0.25 / $2.00	Function calling, reasoning
Dashscope	`dashscope/qwen3-coder-plus`	1M	Tiered Pricing: • 0-32K tokens: $1.00 / $5.00 • 32K-128K tokens: $1.80 / $9.00 • 128K-256K tokens: $3.00 / $15.00 • 256K-1M tokens: $6.00 / $60.00	Function calling, reasoning, caching
Dashscope	`dashscope/qwen3-coder-flash`	1M	Tiered Pricing: • 0-32K tokens: $0.30 / $1.50 • 32K-128K tokens: $0.50 / $2.50 • 128K-256K tokens: $0.80 / $4.00 • 256K-1M tokens: $1.60 / $9.60	Function calling, reasoning, caching

Features

Bedrock
- Bedrock Batches API - batch processing support with file upload and request transformation - PR #14518, PR #14522
VLLM
- Added transcription endpoint support - PR #14523
Ollama
- ollama_chat/ - images, thinking, and content as list handling - PR #14523
General
- New debug flag for detailed request/response logging PR #14482

Bug Fixes

Azure OpenAI
- Fixed extra_body injection causing payload rejection in image generation - PR #14475
LM Studio
- Resolved illegal Bearer header value issue - PR #14512

LLM API Endpoints

Bug Fixes

/messages
- Don't send content block after message w/ finish reason + usage block - PR #14477
/generateContent
- Gemini CLI Integration - Fixed token count errors - PR #14451, PR #14417

Spend Tracking, Budgets and Rate Limiting

Features

Qwen API Tiered Pricing - Added comprehensive tiered cost tracking for Dashscope/Qwen models - PR #14471, PR #14479

Bug Fixes

Provider Budgets - Fixed provider budget calculations - PR #14459

Management Endpoints / UI

Features

User Headers Mapping - New X-LiteLLM Users mapping feature for enhanced user tracking - PR #14485
Key Unblocking - Support for hashed tokens in /key/unblock endpoint - PR #14477
Model Group Header Forwarding - Enhanced wildcard model support with documentation - PR #14528

Bug Fixes

Log Tab Key Alias - Fixed filtering inaccuracies for failed logs - PR #14469, PR #14529

Logging / Guardrail Integrations

Features

Noma Integration - Added non-blocking monitor mode with anonymize input support - PR #14401

Performance / Loadbalancing / Reliability improvements

Performance

Removed dynamic creation of static values - PR #14538
Using _PROXY_MaxParallelRequestsHandler_v3 by default for optimal throughput - PR #14450
Improved execution context propagation into logging tasks - PR #14455

New Contributors

@Sameerlite made their first contribution in PR #14460
@holzman made their first contribution in PR #14459
@sashank5644 made their first contribution in PR #14469
@TomAlon made their first contribution in PR #14401
@AlexsanderHamir made their first contribution in PR #14538

Full Changelog

v1.76.3-stable - Performance, Video Generation & CloudZero Integration

September 6, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

warning

This release has a known issue where startup is leading to Out of Memory errors when deploying on Kubernetes. We recommend waiting before upgrading to this version.

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.76.3

pip install litellm
pip install litellm==1.76.3

Key Highlights

Major Performance Improvements +400 RPS when using correct amount of workers + CPU cores combination
Video Generation Support - Added Google AI Studio and Vertex AI Veo Video Generation through LiteLLM Pass through routes
CloudZero Integration - New cost tracking integration for exporting LiteLLM Usage and Spend data to CloudZero.

Major Changes

Performance Optimization: LiteLLM Proxy now achieves +400 RPS when using correct amount of CPU cores - PR #14153, PR #14242

By default, LiteLLM will now use num_workers = os.cpu_count() to achieve optimal performance.

Override Options:

Set environment variable:
```
DEFAULT_NUM_WORKERS_LITELLM_PROXY=1
```
Or start LiteLLM Proxy with:
```
litellm --num_workers 1
```
Security Fix: Fixed memory_usage_in_mem_cache cache endpoint vulnerability - PR #14229

Performance Improvements

This release includes significant performance optimizations. On our internal benchmarks we saw 1 instance get +400 RPS when using correct amount of workers + CPU cores combination.

+400 RPS Performance Boost - LiteLLM Proxy now uses correct amount of CPU cores for optimal performance - PR #14153
Default CPU Workers - Changed DEFAULT_NUM_WORKERS_LITELLM_PROXY default to number of CPUs - PR #14242

New Models / Updated Models

New Model Support

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)	Features
OpenRouter	`openrouter/openai/gpt-4.1`	1M	$2.00	$8.00	Chat completions with vision
OpenRouter	`openrouter/openai/gpt-4.1-mini`	1M	$0.40	$1.60	Efficient chat completions
OpenRouter	`openrouter/openai/gpt-4.1-nano`	1M	$0.10	$0.40	Ultra-efficient chat
Vertex AI	`vertex_ai/openai/gpt-oss-20b-maas`	131K	$0.075	$0.30	Reasoning support
Vertex AI	`vertex_ai/openai/gpt-oss-120b-maas`	131K	$0.15	$0.60	Advanced reasoning
Gemini	`gemini/veo-3.0-generate-preview`	1K	-	$0.75/sec	Video generation
Gemini	`gemini/veo-3.0-fast-generate-preview`	1K	-	$0.40/sec	Fast video generation
Gemini	`gemini/veo-2.0-generate-001`	1K	-	$0.35/sec	Video generation
Volcengine	`doubao-embedding-large`	4K	Free	Free	2048-dim embeddings
Together AI	`together_ai/deepseek-ai/DeepSeek-V3.1`	128K	$0.60	$1.70	Reasoning support

Features

Google Gemini
- Added 'thoughtSignature' support via 'thinking_blocks' - PR #14122
- Added support for reasoning_effort='minimal' for Gemini models - PR #14262
OpenRouter
- Added GPT-4.1 model family - PR #14101
Groq
- Added support for reasoning_effort parameter - PR #14207
X.AI
- Fixed XAI cost calculation - PR #14127
Vertex AI
- Added support for GPT-OSS models on Vertex AI - PR #14184
- Added additionalProperties to Vertex AI Schema definition - PR #14252
VLLM
- Handle output parsing responses API output - PR #14121
Ollama
- Added unified 'thinking' param support via reasoning_content - PR #14121
Anthropic
- Added supported text field to anthropic citation response - PR #14126
OCI Provider
- Handle assistant messages with both content and tool_calls - PR #14171
Bedrock
- Fixed structure output - PR #14130
- Added initial support for Bedrock Batches API - PR #14190
Databricks
- Added support for anthropic citation API in Databricks - PR #14077

Bug Fixes

Google Gemini (Google AI Studio + Vertex AI)
- Fixed Gemini 2.5 Pro schema validation with OpenAI-style type arrays in tools - PR #14154
- Fixed Gemini Tool Calling empty enum property - PR #14155

New Provider Support

Volcengine
- Added Volcengine embedding module with handler and transformation logic - PR #14028

LLM API Endpoints

Features

Images API
- Added pass through image generation and image editing on OpenAI - PR #14292
- Support extra_body parameter for image generation - PR #14211
Responses API
- Fixed response API for reasoning item in input for litellm proxy - PR #14200
- Added structured output for SDK - PR #14206
Bedrock Passthrough
- Support AWS_BEDROCK_RUNTIME_ENDPOINT on bedrock passthrough - PR #14156
Google AI Studio Passthrough
- Allow using Veo Video Generation through LiteLLM Pass through routes - PR #14228
General
- Added support for safety_identifier parameter in chat.completions.create - PR #14174
- Fixed misclassified 500 error on invalid image_url in /chat/completions request - PR #14149
- Fixed token count error for Gemini CLI - PR #14133

Bugs

General
- Remove "/" or ":" from model name when being used as h11 header name - PR #14191
- Bug fix for openai.gpt-oss when using reasoning_effort parameter - PR #14300

Spend Tracking, Budgets and Rate Limiting

Features

Added header support for spend_logs_metadata - PR #14186
Litellm passthrough cost tracking for chat completion - PR #14256

Bug Fixes

Fixed TPM Rate Limit Bug - PR #14237
Fixed Key Budget not resets at expectable times - PR #14241

Management Endpoints / UI

Features

UI Improvements
- Logs page screen size fixed - PR #14135
- Create Organization Tooltip added on Success - PR #14132
- Back to Keys should say Back to Logs - PR #14134
- Add client side pagination on All Models table - PR #14136
- Model Filters UI improvement - PR #14131
- Remove table filter on user info page - PR #14169
- Team name badge added on the User Details - PR #14003
- Fix: Log page parameter passing error - PR #14193
Authentication & Authorization
- Support for ES256/ES384/ES512 and EdDSA JWT verification - PR #14118
- Ensure team_id is a required field for generating service account keys - PR #14270

Bugs

General
- Validate store model in db setting - PR #14269

Logging / Guardrail Integrations

Features

Datadog
- Ensure apm_id is set on DD LLM Observability traces - PR #14272
Braintrust
- Fix logging when OTEL is enabled - PR #14122
OTEL
- Optional Metrics and Logs following semantic conventions - PR #14179
Slack Alerting
- Added alert type to alert message to slack for easier handling - PR #14176

Guardrails

Added guardrail to the Anthropic API endpoint - PR #14107

New Integration

CloudZero
- LiteLLM x CloudZero Integration for Cost Tracking - PR #14296

Performance / Loadbalancing / Reliability improvements

Features

Performance
- LiteLLM Proxy: +400 RPS when using correct amount of CPU cores - PR #14153
- Allow using x-litellm-stream-timeout header for stream timeout in requests - PR #14147
- Change DEFAULT_NUM_WORKERS_LITELLM_PROXY default to number CPUs - PR #14242
Monitoring
- Added Prometheus missing metrics - PR #14139
Timeout
- Stream Timeout Control - Allow using x-litellm-stream-timeout header for stream timeout in requests - PR #14147
Routing
- Fixed x-litellm-tags not routing with Responses API - PR #14289

Bugs

Security
- Fixed memory_usage_in_mem_cache cache endpoint vulnerability - PR #14229

General Proxy Improvements

Features

SCIM Support
- Added better SCIM debugging - PR #14221
- Bug fixes for handling SCIM Group Memberships - PR #14226
Kubernetes
- Added optional PodDisruptionBudget for litellm proxy - PR #14093
Error Handling
- Add model to azure error message - PR #14294

New Contributors

@iabhi4 made their first contribution in PR #14093
@zainhas made their first contribution in PR #14087
@LifeDJIK made their first contribution in PR #14146
@retanoj made their first contribution in PR #14133
@zhxlp made their first contribution in PR #14193
@kayoch1n made their first contribution in PR #14191
@kutsushitaneko made their first contribution in PR #14171
@mjmendo made their first contribution in PR #14176
@HarshavardhanK made their first contribution in PR #14213
@eycjur made their first contribution in PR #14207
@22mSqRi made their first contribution in PR #14241
@onlylhf made their first contribution in PR #14028
@btpemercier made their first contribution in PR #11319
@tremlin made their first contribution in PR #14287
@TobiMayr made their first contribution in PR #14262
@Eitan1112 made their first contribution in PR #14252

Full Changelog

v1.76.1-stable - Gemini 2.5 Flash Image

August 30, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.76.1

pip install litellm
pip install litellm==1.76.1

Key Highlights

Major Performance Improvements - 6.5x faster LiteLLM Python SDK completion with fastuuid integration.
New Model Support - Gemini 2.5 Flash Image Preview, Grok Code Fast, and GPT Realtime models
Enhanced Provider Support - DeepSeek-v3.1 pricing on Fireworks AI, Vercel AI Gateway, and improved Anthropic/GitHub Copilot integration
MCP Improvements - Better connection testing and SSE MCP tools bug fixes

Major Changes

Added support for using Gemini 2.5 Flash Image Preview with /chat/completions. 🚨 Warning If you were using gemini-2.0-flash-exp-image-generation please follow this migration guide. Gemini Image Generation Migration Guide

Performance Improvements

This release includes significant performance optimizations:

6.5x faster LiteLLM Python SDK Completion - Major performance boost for completion operations - PR #13990
fastuuid Integration - 2.1x faster UUID generation with +80 RPS improvement for /chat/completions and other LLM endpoints - PR #13992, PR #14016
Optimized Request Logging - Don't print request params by default for +50 RPS improvement - PR #14015
Cache Performance - 21% speedup in InMemoryCache.evict_cache and 45% speedup in _is_debugging_on function - PR #14012, PR #13988

New Models / Updated Models

New Model Support

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)	Features
Google	`gemini-2.5-flash-image-preview`	1M	$0.30	$2.50	Chat completions + image generation ($0.039/image)
X.AI	`xai/grok-code-fast`	256K	$0.20	$1.50	Code generation
OpenAI	`gpt-realtime`	32K	$4.00	$16.00	Real-time conversation + audio
Vercel AI Gateway	`vercel_ai_gateway/openai/o3`	200K	$2.00	$8.00	Advanced reasoning
Vercel AI Gateway	`vercel_ai_gateway/openai/o3-mini`	200K	$1.10	$4.40	Efficient reasoning
Vercel AI Gateway	`vercel_ai_gateway/openai/o4-mini`	200K	$1.10	$4.40	Latest mini model
DeepInfra	`deepinfra/zai-org/GLM-4.5`	131K	$0.55	$2.00	Chat completions
Perplexity	`perplexity/codellama-34b-instruct`	16K	$0.35	$1.40	Code generation
Fireworks AI	`fireworks_ai/accounts/fireworks/models/deepseek-v3p1`	128K	$0.56	$1.68	Chat completions

Additional Models Added: Various other Vercel AI Gateway models were added too. See models.litellm.ai for the full list.

Features

Google Gemini
- Added support for gemini-2.5-flash-image-preview with image return capability - PR #13979, PR #13983
- Support for requests with only system prompt - PR #14010
- Fixed invalid model name error for Gemini Imagen models - PR #13991
X.AI
- Added xai/grok-code-fast model family support - PR #14054
- Fixed frequency_penalty parameter for grok-4 models - PR #14078
OpenAI
- Added support for gpt-realtime models - PR #14082
- Support for reasoning and reasoning_effort parameters by default - PR #12865
Fireworks AI
- Added DeepSeek-v3.1 pricing - PR #13958
DeepInfra
- Fixed reasoning_effort setting for DeepSeek-V3.1 - PR #14053
GitHub Copilot
- Added support for thinking and reasoning_effort parameters - PR #13691
- Added image headers support - PR #13955
Anthropic
- Support for custom Anthropic-compatible API endpoints - PR #13945
- Fixed /messages fallback from Anthropic API to Bedrock API - PR #13946
Nebius
- Expanded provider models and normalized model IDs - PR #13965
Vertex AI
- Fixed Vertex Mistral streaming issues - PR #13952
- Fixed anyOf corner cases for Gemini tool calls - PR #12797
Bedrock
- Fixed structure output issues - PR #14005
OpenRouter
- Added GPT-5 family models pricing - PR #13536

New Provider Support

Vercel AI Gateway
- New provider support added - PR #13144
DataRobot
- Added provider documentation - PR #14038, PR #14074

LLM API Endpoints

Features

Images API
- Support for multiple images in OpenAI images/edits endpoint - PR #13916
- Allow using dynamic api_key for image generation requests - PR #14007
Responses API
- Fixed /responses endpoint ignoring extra_headers in GitHub Copilot - PR #13775
- Added support for new web_search tool - PR #14083
Azure Passthrough
- Fixed Azure Passthrough request with streaming - PR #13831

Bugs

General
- Fixed handling of None metadata in batch requests - PR #13996
- Fixed token_counter with special token input - PR #13374
- Removed incorrect web search support for azure/gpt-4.1 family - PR #13566

MCP Gateway

Features

SSE MCP Tools
- Bug fix for adding SSE MCP tools - improved connection testing when adding MCPs - PR #14048

Features

Team Management
- Allow setting Team Member RPM/TPM limits when creating a team - PR #13943
UI Improvements
- Fixed Next.js Security Vulnerabilities in UI Dashboard - PR #14084
- Fixed collapsible navbar design - PR #14075

Bugs

Authentication
- Fixed Virtual keys with llm_api type causing Internal Server Error for /anthropic/* and other LLM passthrough routes - PR #14046

Logging / Guardrail Integrations

Features

Langfuse OTEL
- Allow using LANGFUSE_OTEL_HOST for configuring host - PR #14013
Braintrust
- Added span name metadata feature - PR #13573
- Fixed tests to reference moved attributes in braintrust_logging module - PR #13978
OpenMeter
- Set user from token user_id for OpenMeter integration - PR #13152

New Guardrail Support

Noma Security
- Added Noma Security guardrail support - PR #13572
Pangea
- Updated Pangea Guardrail to support new AIDR endpoint - PR #13160

Performance / Loadbalancing / Reliability improvements

Features

Caching
- Verify if cache entry has expired prior to serving it to client - PR #13933
- Fixed error saving latency as timedelta on Redis - PR #14040
Router
- Refactored router to choose weights by 'weight', 'rpm', 'tpm' in one loop for simple_shuffle - PR #13562
Logging
- Fixed LoggingWorker graceful shutdown to prevent CancelledError warnings - PR #14050
- Enhanced logging for containers to log on files both with usual format and json format - PR #13394

Bugs

Dependencies
- Bumped orjson version to "3.11.2" - PR #13969

General Proxy Improvements

Features

AWS
- Add support for AWS assume_role with a session token - PR #13919
OCI Provider
- Added oci_key_file as an optional_parameter - PR #14036
Configuration
- Allow configuration to set threshold before request entry in spend log gets truncated - PR #14042
- Enhanced proxy_config configuration: add support for existing configmap in Helm charts - PR #14041
Docker
- Added back supervisor to non-root image - PR #13922

New Contributors

@ArthurRenault made their first contribution in PR #13922
@stevenmanton made their first contribution in PR #13919
@uc4w6c made their first contribution in PR #13914
@nielsbosma made their first contribution in PR #13573
@Yuki-Imajuku made their first contribution in PR #13567
@codeflash-ai[bot] made their first contribution in PR #13988
@ColeFrench made their first contribution in PR #13978
@dttran-glo made their first contribution in PR #13969
@manascb1344 made their first contribution in PR #13965
@DorZion made their first contribution in PR #13572
@edwardsamuel made their first contribution in PR #13536
@blahgeek made their first contribution in PR #13374
@Deviad made their first contribution in PR #13394
@XSAM made their first contribution in PR #13775
@KRRT7 made their first contribution in PR #14012
@ikaadil made their first contribution in PR #13991
@timelfrink made their first contribution in PR #13691
@qidu made their first contribution in PR #13562
@nagyv made their first contribution in PR #13243
@xywei made their first contribution in PR #12885
@ericgtkb made their first contribution in PR #12797
@NoWall57 made their first contribution in PR #13945
@lmwang9527 made their first contribution in PR #14050
@WilsonSunBritten made their first contribution in PR #14042
@Const-antine made their first contribution in PR #14041
@dmvieira made their first contribution in PR #14040
@gotsysdba made their first contribution in PR #14036
@moshemorad made their first contribution in PR #14005
@joshualipman123 made their first contribution in PR #13144

Full Changelog

v1.76.0-stable - RPS Improvements

August 23, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

info

LiteLLM is hiring a Founding Backend Engineer, in San Francisco.

Apply here if you're interested!

Deploy this version

info

This release is not live yet.

New Models / Updated Models

Bugs

OpenAI
- Gpt-5 chat: clarify does not support function calling PR #13612, s/o @superpoussin22
VertexAI
- fix vertexai batch file format by @thiagosalvatore in PR #13576
LiteLLM Proxy
- Add support for calling image_edits + image_generations via SDK to Proxy - PR #13735
OpenRouter
- Fix max_output_tokens value for anthropic Claude 4 - PR #13526
Gemini
- Fix prompt caching cost calculation - PR #13742
Azure
- Support ../openai/v1/respones api base - PR #13526
- Fix azure/gpt-5-chat max_input_tokens - PR #13660
Groq
- streaming ASCII encoding issue - PR #13675
Baseten
- Refactored integration to use new openai-compatible endpoints - PR #13783
Bedrock
- fix application inference profile for pass-through endpoints for bedrock - PR #13881
DataRobot
- Updated URL handling for DataRobot provider URL - PR #13880

Features

Together AI
- Added Qwen3, Deepseek R1 0528 Throughput, GLM 4.5 and GPT-OSS models cost tracking - PR #13637, s/o @Tasmay-Tibrewal
Fireworks AI
- add fireworks_ai/accounts/fireworks/models/deepseek-v3-0324 - PR #13821
VertexAI
- Add VertexAI qwen API Service - PR #13828
- Add new VertexAI image models vertex_ai/imagen-4.0-generate-001, vertex_ai/imagen-4.0-ultra-generate-001, vertex_ai/imagen-4.0-fast-generate-001 - PR #13874
Anthropic
- Add long context support w/ cost tracking - PR #13759
DeepInfra
- Add rerank endpoint support for deepinfra - PR #13820
- Add new models for cost tracking - PR #13883, s/o @Toy-97
Bedrock
- Add tool prompt caching on async calls - PR #13803, s/o @UlookEE
- role chaining and session name with webauthentication for aws bedrock - PR #13753, s/o @RichardoC
Ollama
- Handle Ollama null response when using tool calling with non-tool trained models - PR #13902
OpenRouter
- Add deepseek/deepseek-chat-v3.1 support - PR #13897
Mistral
- Add support for calling mistral files via chat completions - PR #13866, s/o @jinskjoy
- Handle empty assistant content - PR #13671
- Support new ‘thinking’ response block - PR #13671
Databricks
- remove deprecated dbrx models (dbrx-instruct, llama 3.1) - PR #13843
AI/ML API
- Image gen api support - PR #13893

LLM API Endpoints

Bugs

Responses API
- add default api version for openai responses api calls - PR #13526
- support allowed_openai_params - PR #13671

MCP Gateway

Bugs

fix StreamableHTTPSessionManager .run() error - PR #13666

Vector Stores

Bugs

Bedrock
- Using LiteLLM Managed Credentials for Query - PR #13787

Management Endpoints / UI

Bugs

Passthrough
- Fix query passthrough deletion - PR #13622

Features

Models
- Add Search Functionality for Public Model Names in Model Dashboard - PR #13687
- Auto-Add azure/ to deployment Name in UI - PR #13685
- Models page row UI restructure - PR #13771
Notifications
- Add new notifications toast UI everywhere - PR #13813
Keys
- Fix key edit settings after regenerating a key - PR #13815
- Require team_id when creating service account keys - PR #13873
- Filter - show all options on filter option click - PR #13858
Usage
- Fix ‘Cannot read properties of undefined’ exception on user agent activity tab - PR #13892
SSO
- Free SSO usage for up to 5 users - PR #13843

Logging / Guardrail Integrations

Bugs

Bedrock Guardrails
- Add bedrock api key support - PR #13835

Features

Datadog LLM Observability
- Add support for Failure Logging PR #13726
- Add time to first token, litellm overhead, guardrail overhead latency metrics - PR #13734
- Add support for tracing guardrail input/output - PR #13767
Langfuse OTEL
- Allow using Key/Team Based Logging - PR #13791
AIM
- Migrate to new firewall API - PR #13748
OTEL
- Add OTEL tracing for actual LLM API call - PR #13836
MLFlow
- Include predicted output in MLflow tracing - PR #13795, s/o @TomeHirata

Performance / Loadbalancing / Reliability improvements

Bugs

Cooldowns
- don't return raw Azure Exceptions to client (can contain prompt leakage) - PR #13529
Auto-router
- Ensures the relevant dependencies for auto router existing on LiteLLM Docker - PR #13788
Model Alias
- Fix calling key with access to model alias - PR #13830

Features

S3 Caching
- Use namespace as prefix for s3 cache - PR #13704
- Async S3 Caching support (4x RPS improvement) - PR #13852, s/o @michal-otmianowski
Model Group header forwarding
- reuse same logic as global header forwarding - PR #13741
- add support for hosted_vllm on UI - PR #13885
Performance
- Improve LiteLLM Python SDK RPS by +200 RPS (braintrust import + aiohttp transport fixes) - PR #13839
- Use O(1) Set lookups for model routing - PR #13879
- Reduce Significant CPU overhead from litellm_logging.py - PR #13895
- Improvements for Async Success Handler (Logging Callbacks) - Approx +130 RPS - PR #13905

General Proxy Improvements

Bugs

SDK
- Fix litellm compatibility with newest release of openAI (>v1.100.0) - PR #13728
Helm
- Add possibility to configure resources for migrations-job - PR #13617
- Ensure Helm chart auto generated master keys follow sk-xxxx format - PR #13871
- Enhance database configuration: add support for optional endpointKey - PR #13763
Rate Limits
- fixing descriptor/response size mismatch on parallel_request_limiter_v3 - PR #13863, s/o @luizrennocosta
Non-root
- fix permission access on prisma migrate in non-root image - PR #13848, s/o @Ithanil

v1.75.8-stable - Team Member Rate Limits

August 16, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.75.8-stable

pip install litellm
pip install litellm==1.75.8

Key Highlights

Team Member Rate Limits - Individual rate limiting for team members with JWT authentication support.
Performance Improvements - New experimental HTTP handler flag for 100+ RPS improvement on OpenAI calls.
GPT-5 Model Family Support - Full support for OpenAI's GPT-5 models with reasoning_effort parameter and Azure OpenAI integration.
Azure AI Flux Image Generation - Support for Azure AI's Flux image generation models.

Team Member Rate Limits

LiteLLM MCP Architecture: Use MCP tools with all LiteLLM supported models

This release adds support for setting rate limits on individual members (including machine users) within a team. Teams can now give each agent its own rate limits—so that heavy-traffic agents don’t impact other agents or human users.

Agents can authenticate with LiteLLM using JWT and the same team role as human users, while still enforcing per-agent rate limits.

New Models / Updated Models

New Model Support

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)	Features
Azure AI	`azure_ai/FLUX-1.1-pro`	-	-	$40/image	Image generation
Azure AI	`azure_ai/FLUX.1-Kontext-pro`	-	-	$40/image	Image generation
Vertex AI	`vertex_ai/deepseek-ai/deepseek-r1-0528-maas`	65k	$1.35	$5.4	Chat completions + reasoning
OpenRouter	`openrouter/deepseek/deepseek-chat-v3-0324`	65k	$0.14	$0.28	Chat completions

Features

OpenAI
- Added reasoning_effort parameter support for GPT-5 model family - PR #13475, Get Started
- Support for reasoning parameter in Responses API - PR #13475, Get Started
Azure OpenAI
- GPT-5 support with max_tokens and reasoning parameter - PR #13510, Get Started
AWS Bedrock
- Streaming support for bedrock gpt-oss model family - PR #13346, Get Started
- /messages endpoint compatibility with bedrock/converse/<model> - PR #13627
- Cache point support for assistant and tool messages - PR #13640
Azure AI
- New Azure AI Flux Image Generation provider - PR #13592, Get Started
- Fixed Content-Type header for image generation - PR #13584
CometAPI
- New provider support with chat completions and streaming - PR #13458
SambaNova
- Added embedding model support - PR #13308, Get Started
Vertex AI
- Added /countTokens endpoint support for Gemini CLI integration - PR #13545
- Token counter support for VertexAI models - PR #13558
hosted_vllm
- Added reasoning_effort parameter support - PR #13620, Get Started

Bugs

OCI
- Fixed streaming issues - PR #13437
Ollama
- Fixed GPT-OSS streaming with 'thinking' field - PR #13375
VolcEngine
- Fixed thinking disabled parameter handling - PR #13598
Streaming
- Consistent 'finish_reason' chunk indexing - PR #13560

LLM API Endpoints

Features

/messages
- Tool use arguments properly returned for non-anthropic models - PR #13638

Bugs

Real-time API
- Fixed endpoint for no intent scenarios - PR #13476
Responses API
- Fixed stream=True + background=True with Responses API - PR #13654

MCP Gateway

Features

Access Control & Configuration
- Enhanced MCPServerManager with access groups and description support - PR #13549

Bugs

Authentication
- Fixed MCP gateway key authentication - PR #13630

Features

Team Management
- Team Member Rate Limits implementation - PR #13601
- JWT authentication support for team member rate limits - PR #13601
- Show team member TPM/RPM limits in UI - PR #13662
- Allow editing team member RPM/TPM limits - PR #13669
- Allow unsetting TPM and RPM in Teams Settings - PR #13430
- Team Member Permissions Page access column changes - PR #13145
Key Management
- Display errors from backend on the UI Keys page - PR #13435
- Added confirmation modal before deleting keys - PR #13655
- Support for user parameter in LiteLLM SDK to Proxy communication - PR #13555
UI Improvements
- Fixed internal users table overflow - PR #12736
- Enhanced chart readability with short-form notation for large numbers - PR #12370
- Fixed image overflow in LiteLLM model display - PR #13639
- Removed ambiguous network response errors - PR #13582
Credentials
- Added CredentialDeleteModal component and integration with CredentialsPanel - PR #13550
Admin & Permissions
- Allow routes for admin viewer - PR #13588

Bugs

SCIM Integration
- Fixed SCIM Team Memberships metadata handling - PR #13553
Authentication
- Fixed incorrect key info endpoint - PR #13633

Logging / Guardrail Integrations

Features

Langfuse OTEL
- Added key/team logging for Langfuse OTEL Logger - PR #13512
- Fixed LangfuseOtelSpanAttributes constants to match expected values - PR #13659
MLflow
- Updated MLflow logger usage span attributes - PR #13561

Bugs

Security
- Hide sensitive data in /model/info - azure entra client_secret - PR #13577
- Fixed trivy/secrets false positives - PR #13631

Performance / Loadbalancing / Reliability improvements

Features

HTTP Performance
- New 'EXPERIMENTAL_OPENAI_BASE_LLM_HTTP_HANDLER' flag for +100 RPS improvement on OpenAI calls - PR #13625
Database Monitoring
- Added DB metrics to Prometheus - PR #13626
Error Handling
- Added safe divide by 0 protection to prevent crashes - PR #13624

Bugs

Dependencies
- Updated boto3 to 1.36.0 and aioboto3 to 13.4.0 - PR #13665

General Proxy Improvements

Features

Database
- Removed redundant use_prisma_migrate flag - now default - PR #13555
LLM Translation
- Added model ID check - PR #13507
- Refactored Anthropic configurations and added support for anthropic_beta headers - PR #13590

New Contributors

@TensorNull made their first contribution in PR #13458
@MajorD00m made their first contribution in PR #13577
@VerunicaM made their first contribution in PR #13584
@huangyafei made their first contribution in PR #13607
@TomeHirata made their first contribution in PR #13561
@willfinnigan made their first contribution in PR #13659
@dcbark01 made their first contribution in PR #13633
@javacruft made their first contribution in PR #13631

Full Changelog

v1.75.5-stable - Redis latency improvements

August 10, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.75.5-stable

pip install litellm
pip install litellm==1.75.5.post2

Key Highlights

Redis - Latency Improvements - Reduces P99 latency by 50% with Redis enabled.
Responses API Session Management - Support for managing responses API sessions with images.
Oracle Cloud Infrastructure - New LLM provider for calling models on Oracle Cloud Infrastructure.
Digital Ocean's Gradient AI - New LLM provider for calling models on Digital Ocean's Gradient AI platform.

Risk of Upgrade

If you build the proxy from the pip package, you should hold off on upgrading. This version makes prisma migrate deploy our default for managing the DB. This is safer, as it doesn't reset the DB, but it requires a manual prisma generate step.

Users of our Docker image, are not affected by this change.

Redis Latency Improvements

This release adds in-memory caching for Redis requests, enabling faster response times in high-traffic. Now, LiteLLM instances will check their in-memory cache for a cache hit, before checking Redis. This reduces caching-related latency from 100ms for LLM API calls to sub-1ms, on cache hits.

Responses API Session Management w/ Images

LiteLLM now supports session management for Responses API requests with images. This is great for use-cases like chatbots, that are using the Responses API to track the state of a conversation. LiteLLM session management works across ALL LLM API's (including Anthropic, Bedrock, OpenAI, etc). LiteLLM session management works by storing the request and response content in an s3 bucket, you can specify.

New Models / Updated Models

New Model Support

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)
Bedrock	`bedrock/us.anthropic.claude-opus-4-1-20250805-v1:0`	200k	$15	$75
Bedrock	`bedrock/openai.gpt-oss-20b-1:0`	200k	0.07	0.3
Bedrock	`bedrock/openai.gpt-oss-120b-1:0`	200k	0.15	0.6
Fireworks AI	`fireworks_ai/accounts/fireworks/models/glm-4p5`	128k	0.55	2.19
Fireworks AI	`fireworks_ai/accounts/fireworks/models/glm-4p5-air`	128k	0.22	0.88
Fireworks AI	`fireworks_ai/accounts/fireworks/models/gpt-oss-120b`	131072	0.15	0.6
Fireworks AI	`fireworks_ai/accounts/fireworks/models/gpt-oss-20b`	131072	0.05	0.2
Groq	`groq/openai/gpt-oss-20b`	131072	0.1	0.5
Groq	`groq/openai/gpt-oss-120b`	131072	0.15	0.75
OpenAI	`openai/gpt-5`	400k	1.25	10
OpenAI	`openai/gpt-5-2025-08-07`	400k	1.25	10
OpenAI	`openai/gpt-5-mini`	400k	0.25	2
OpenAI	`openai/gpt-5-mini-2025-08-07`	400k	0.25	2
OpenAI	`openai/gpt-5-nano`	400k	0.05	0.4
OpenAI	`openai/gpt-5-nano-2025-08-07`	400k	0.05	0.4
OpenAI	`openai/gpt-5-chat`	400k	1.25	10
OpenAI	`openai/gpt-5-chat-latest`	400k	1.25	10
Azure	`azure/gpt-5`	400k	1.25	10
Azure	`azure/gpt-5-2025-08-07`	400k	1.25	10
Azure	`azure/gpt-5-mini`	400k	0.25	2
Azure	`azure/gpt-5-mini-2025-08-07`	400k	0.25	2
Azure	`azure/gpt-5-nano-2025-08-07`	400k	0.05	0.4
Azure	`azure/gpt-5-nano`	400k	0.05	0.4
Azure	`azure/gpt-5-chat`	400k	1.25	10
Azure	`azure/gpt-5-chat-latest`	400k	1.25	10

Features

OCI
- New LLM provider - PR #13206
JinaAI
- support multimodal embedding models - PR #13181
GPT-5 (OpenAI/Azure)
- Support drop_params for temperature - PR #13390
- Map max_tokens to max_completion_tokens - PR #13390
Anthropic
- Add claude-opus-4-1 on model cost map - PR #13384
OpenRouter
- Add gpt-oss to model cost map - PR #13442
Cerebras
- Add gpt-oss to model cost map - PR #13442
Azure
- Support drop params for ‘temperature’ on o-series models - PR #13353
GradientAI
- New LLM Provider - PR #12169

Bugs

OpenAI
- Add ‘service_tier’ and ‘safety_identifier’ as supported responses api params - PR #13258
- Correct pricing for web search on 4o-mini - PR #13269
Mistral
- Handle $id and $schema fields when calling mistral - PR #13389

LLM API Endpoints

Features

/responses
- Responses API Session Handling w/ support for images - PR #13347
- failed if input containing ResponseReasoningItem - PR #13465
- Support custom tools - PR #13418

Bugs

/chat/completions
- Fix completion_token_details usage object missing ‘text’ tokens - PR #13234
- (SDK) handle tool being a pydantic object - PR #13274
- include cost in streaming usage object - PR #13418
- Exclude none fields on /chat/completion - allows usage with n8n - PR #13320
/responses
- Transform function call in response for non-openai models (gemini/anthropic) - PR #13260
- Fix unsupported operand error with model groups - PR #13293
- Responses api session management for streaming responses - PR #13396
/v1/messages
- Added litellm claude code count tokens - PR #13261
/vector_stores
- Fix create/search vector store errors - PR #13285

MCP Gateway

Features

Add route check for internal users - PR #13350
MCP Guardrails - docs - PR #13392

Bugs

Fix auth on UI for bearer token servers - PR #13312
allow access group on mcp tool retrieval - PR #13425

Management Endpoints / UI

Features

Teams
- Add team deletion check for teams with keys - PR #12953
Models
- Add ability to set model alias per key/team - PR #13276
- New button to reload model pricing from model cost map - PR #13464, PR #13470
Keys
- Make ‘team’ field required when creating service account keys - PR #13302
- Gray out key-based logging settings for non-enterprise users - prevents confusion on if ‘logging’ all up is supported - PR #13431
Navbar
- Add logo customization for LiteLLM admin UI - PR #12958
Logs
- Add token breakdowns on logs + session page - PR #13357
Usage
- Ensure Usage Page loads after the DB has large entries - PR #13400
Test Key Page
- allow uploading images for /chat/completions and /responses - PR #13445
MCP
- Add auth tokens to local storage auth - PR #13473

Bugs

Custom Root Path
- Fix login route when SSO is enabled - PR #13267
Customers/End-users
- Allow calling /v1/models when end user over budget - allows model listing to work on OpenWebUI when customer over budget - PR #13320
Teams
- Remove user - team membership, when user removed from team - PR #13433
Errors
- Bubble up network errors to user for Logging and Alerts page - PR #13427
Model Hub
- Show pricing for azure models, when base model is set - PR #13418

Logging / Guardrail Integrations

Features

Bedrock Guardrails
- Redacted sensitive information in bedrock guardrails error message - PR #13356
Standard Logging Payload
- Fix ‘can’t register atextexit’ bug - PR #13436

Bugs

Braintrust
- Allow setting of braintrust callback base url - PR #13368
OTEL
- Track pre_call hook latency - PR #13362

Performance / Loadbalancing / Reliability improvements

Features

Team-BYOK models
- Add wildcard model support - PR #13278
Caching
- GCP IAM auth support for caching - PR #13275
Latency
- reduce p99 latency w/ redis enabled by 50% - only updates model usage if tpm/rpm limits set - PR #13362

General Proxy Improvements

Features

Models
- Support /v1/models/{model_id} retrieval - PR #13268
Multi-instance
- Ensure disable_llm_api_endpoints works - PR #13278
Logs
- Add apscheduler log suppress - PR #13299
Helm
- Add labels to migrations job template - PR #13343 s/o @unique-jakub

Bugs

Non-root image
- Fix non-root image for migration - PR #13379
Get Routes
- Load get routes when using fastapi-offline - PR #13466
Health checks
- Generate unique trace IDs for Langfuse health checks - PR #13468
Swagger
- Allow using Swagger for /chat/completions - PR #13469
Auth
- Fix JWTs access not working with model access groups - PR #13474

New Contributors

@bbartels made their first contribution in https://github.com/BerriAI/litellm/pull/13244
@breno-aumo made their first contribution in https://github.com/BerriAI/litellm/pull/13206
@pascalwhoop made their first contribution in https://github.com/BerriAI/litellm/pull/13122
@ZPerling made their first contribution in https://github.com/BerriAI/litellm/pull/13045
@zjx20 made their first contribution in https://github.com/BerriAI/litellm/pull/13181
@edwarddamato made their first contribution in https://github.com/BerriAI/litellm/pull/13368
@msannan2 made their first contribution in https://github.com/BerriAI/litellm/pull/12169

Full Changelog

v1.74.15-stable

August 2, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.74.15-stable

pip install litellm
pip install litellm==1.74.15.post2

Key Highlights

User Agent Activity Tracking - Track how much usage each coding tool gets.
Prompt Management - Use Git-Ops style prompt management with prompt templates.
MCP Gateway: Guardrails - Support for using Guardrails with MCP servers.
Google AI Studio Imagen4 - Support for using Imagen4 models on Google AI Studio.

User Agent Activity Tracking

This release brings support for tracking usage and costs for AI-powered coding tools like Claude Code, Roo Code, Gemini CLI through LiteLLM. You can now track LLM cost, total tokens used, and DAU/WAU/MAU for each coding tool.

This is great to central AI Platform teams looking to track how they are helping developer productivity.

Prompt Management

New Model Support

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)	Cost per Image
OpenRouter	`openrouter/x-ai/grok-4`	256k	$3	$15	N/A
Google AI Studio	`gemini/imagen-4.0-generate-001`	N/A	N/A	N/A	$0.04
Google AI Studio	`gemini/imagen-4.0-ultra-generate-001`	N/A	N/A	N/A	$0.06
Google AI Studio	`gemini/imagen-4.0-fast-generate-001`	N/A	N/A	N/A	$0.02
Google AI Studio	`gemini/imagen-3.0-generate-002`	N/A	N/A	N/A	$0.04
Google AI Studio	`gemini/imagen-3.0-generate-001`	N/A	N/A	N/A	$0.04
Google AI Studio	`gemini/imagen-3.0-fast-generate-001`	N/A	N/A	N/A	$0.02

Features

Google AI Studio
- Added Google AI Studio Imagen4 model family support - PR #13065, Get Started
Azure OpenAI
- Azure api_version="preview" support - PR #13072, Get Started
- Password protected certificate files support - PR #12995, Get Started
AWS Bedrock
- Cost tracking via Anthropic /v1/messages - PR #13072
- Computer use support - PR #13150
OpenRouter
- Added Grok4 model support - PR #13018
Anthropic
- Auto Cache Control Injection - Improved cache_control_injection_points with negative index support - PR #13187, Get Started
- Working mid-stream fallbacks with token usage tracking - PR #13149, PR #13170
Perplexity
- Citation annotations support - PR #13225

Bugs

Gemini
- Fix merge_reasoning_content_in_choices parameter issue - PR #13066, Get Started
- Added support for using GOOGLE_API_KEY environment variable for Google AI Studio - PR #12507
vLLM/OpenAI-like
- Fix missing extra_headers support for embeddings - PR #13198

LLM API Endpoints

Bugs

/generateContent
- Support for query_params in generateContent routes for API Key setting - PR #13100
- Ensure "x-goog-api-key" is used for auth to google ai studio when using /generateContent on LiteLLM - PR #13098
- Ensure tool calling works as expected on generateContent - PR #13189
/vertex_ai (Passthrough)
- Ensure multimodal embedding responses are logged properly - PR #13050

MCP Gateway

Features

Health Check Improvements
- Add health check endpoints for MCP servers - PR #13106
Guardrails Integration
- Add pre and during call hooks initialization - PR #13067
- Move pre and during hooks to ProxyLogging - PR #13109
- MCP pre and during guardrails implementation - PR #13188
Protocol & Header Support
- Add protocol headers support - PR #13062
URL & Namespacing
- Improve MCP server URL validation for internal/Kubernetes URLs - PR #13099

Bugs

UI
- Fix scrolling issue with MCP tools - PR #13015
- Fix MCP client list failure - PR #13114

Features

Usage Analytics
- New tab for user agent activity tracking - PR #13146
- Daily usage per user analytics - PR #13147
- Default usage chart date range set to last 7 days - PR #12917
- New advanced date range picker component - PR #13141, PR #13221
- Show loader on usage cost charts after date selection - PR #13113
Models
- Added Voyage, Jinai, Deepinfra and VolcEngine providers on UI - PR #13131
- Added Sagemaker on UI - PR #13117
- Preserve model order in /v1/models and /model_group/info endpoints - PR #13178
Key Management
- Properly parse JSON options for key generation in UI - PR #12989
Authentication
- JWT Fields
  - Add dot notation support for all JWT fields - PR #13013

Bugs

Permissions
- Fix object permission for organizations - PR #13142
- Fix list team v2 security check - PR #13094
Models
- Fix model reload on model update - PR #13216
Router Settings
- Fix displaying models for fallbacks in UI - PR #13191
- Fix wildcard model name handling with custom values - PR #13116
- Fix fallback delete functionality - PR #12606

Logging / Guardrail Integrations

Features

MLFlow
- Allow adding tags for MLFlow logging requests - PR #13108
Langfuse OTEL
- Add comprehensive metadata support to Langfuse OpenTelemetry integration - PR #12956
Datadog LLM Observability
- Allow redacting message/response content for specific logging integrations - PR #13158

Bugs

API Key Logging
- Fix API Key being logged inappropriately - PR #12978
MCP Spend Tracking
- Set default value for MCP namespace tool name in spend table - PR #12894

Performance / Loadbalancing / Reliability improvements

Features

Background Health Checks
- Allow disabling background health checks for specific deployments - PR #13186
Database Connection Management
- Ensure stale Prisma clients disconnect DB connections properly - PR #13140
Jitter Improvements
- Fix jitter calculation (should be added not multiplied) - PR #12901

Bugs

Anthropic Streaming
- Always use choice index=0 for Anthropic streaming responses - PR #12666
Custom Auth
- Bubble up custom exceptions properly - PR #13093
OTEL with Managed Files
- Fix using managed files with OTEL integration - PR #13171

General Proxy Improvements

Features

Database Migration
- Move to use_prisma_migrate by default - PR #13117
- Resolve team-only models on auth checks - PR #13117
Infrastructure
- Loosened MCP Python version restrictions - PR #13102
- Migrate build_and_test to CI/CD Postgres DB - PR #13166
Helm Charts
- Allow Helm hooks for migration jobs - PR #13174
- Fix Helm migration job schema updates - PR #12809

Bugs

Docker
- Remove obsolete version attribute in docker-compose - PR #13172
- Add openssl in runtime stage for non-root Dockerfile - PR #13168
Database Configuration
- Fix DB config through environment variables - PR #13111
Logging
- Suppress httpx logging - PR #13217
Token Counting
- Ignore unsupported keys like prefix in token counter - PR #11954

New Contributors

@5731la made their first contribution in https://github.com/BerriAI/litellm/pull/12989
@restato made their first contribution in https://github.com/BerriAI/litellm/pull/12980
@strickvl made their first contribution in https://github.com/BerriAI/litellm/pull/12956
@Ne0-1 made their first contribution in https://github.com/BerriAI/litellm/pull/12995
@maxrabin made their first contribution in https://github.com/BerriAI/litellm/pull/13079
@lvuna made their first contribution in https://github.com/BerriAI/litellm/pull/12894
@Maximgitman made their first contribution in https://github.com/BerriAI/litellm/pull/12666
@pathikrit made their first contribution in https://github.com/BerriAI/litellm/pull/12901
@huetterma made their first contribution in https://github.com/BerriAI/litellm/pull/12809
@betterthanbreakfast made their first contribution in https://github.com/BerriAI/litellm/pull/13029
@phosae made their first contribution in https://github.com/BerriAI/litellm/pull/12606
@sahusiddharth made their first contribution in https://github.com/BerriAI/litellm/pull/12507
@Amit-kr26 made their first contribution in https://github.com/BerriAI/litellm/pull/11954
@kowyo made their first contribution in https://github.com/BerriAI/litellm/pull/13172
@AnandKhinvasara made their first contribution in https://github.com/BerriAI/litellm/pull/13187
@unique-jakub made their first contribution in https://github.com/BerriAI/litellm/pull/13174
@tyumentsev4 made their first contribution in https://github.com/BerriAI/litellm/pull/13134
@aayush-malviya-acquia made their first contribution in https://github.com/BerriAI/litellm/pull/12978
@kankute-sameer made their first contribution in https://github.com/BerriAI/litellm/pull/13225
@AlexanderYastrebov made their first contribution in https://github.com/BerriAI/litellm/pull/13178

Full Changelog

v1.74.9-stable - Auto-Router

July 27, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.74.9-stable.patch.1

pip install litellm
pip install litellm==1.74.9.post2

Key Highlights

Auto-Router - Automatically route requests to specific models based on request content.
Model-level Guardrails - Only run guardrails when specific models are used.
MCP Header Propagation - Propagate headers from client to backend MCP.
New LLM Providers - Added Bedrock inpainting support and Recraft API image generation / image edits support.

Auto-Router

This release introduces auto-routing to models based on request content. This means Proxy Admins can define a set of keywords that always routes to specific models when users opt in to using the auto-router.

This is great for internal use cases where you don't want users to think about which model to use - for example, use Claude models for coding vs GPT models for generating ad copy.

Model-level Guardrails

This release brings model-level guardrails support to your config.yaml + UI. This is great for cases when you have an on-prem and hosted model, and just want to run prevent sending PII to the hosted model.

model_list:
  - model_name: claude-sonnet-4
    litellm_params:
      model: anthropic/claude-sonnet-4-20250514
      api_key: os.environ/ANTHROPIC_API_KEY
      api_base: https://api.anthropic.com/v1
      guardrails: ["azure-text-moderation"] # 👈 KEY CHANGE

guardrails:
  - guardrail_name: azure-text-moderation
    litellm_params:
      guardrail: azure/text_moderations
      mode: "post_call" 
      api_key: os.environ/AZURE_GUARDRAIL_API_KEY
      api_base: os.environ/AZURE_GUARDRAIL_API_BASE 

MCP Header Propagation

v1.74.9-stable allows you to propagate MCP server specific authentication headers via LiteLLM

Allowing users to specify which header_name is to be propagated to which mcp_server via headers
Allows adding of different deployments of same MCP server type to use different authentication headers

Pricing / Context Window Updates

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)
Fireworks AI	`fireworks/models/kimi-k2-instruct`	131k	$0.6	$2.5
OpenRouter	`openrouter/qwen/qwen-vl-plus`	8192	$0.21	$0.63
OpenRouter	`openrouter/qwen/qwen3-coder`	8192	$1	$5
OpenRouter	`openrouter/bytedance/ui-tars-1.5-7b`	128k	$0.10	$0.20
Groq	`groq/qwen/qwen3-32b`	131k	$0.29	$0.59
VertexAI	`vertex_ai/meta/llama-3.1-8b-instruct-maas`	128k	$0.00	$0.00
VertexAI	`vertex_ai/meta/llama-3.1-405b-instruct-maas`	128k	$5	$16
VertexAI	`vertex_ai/meta/llama-3.2-90b-vision-instruct-maas`	128k	$0.00	$0.00
Google AI Studio	`gemini/gemini-2.0-flash-live-001`	1,048,576	$0.35	$1.5
Google AI Studio	`gemini/gemini-2.5-flash-lite`	1,048,576	$0.1	$0.4
VertexAI	`vertex_ai/gemini-2.0-flash-lite-001`	1,048,576	$0.35	$1.5
OpenAI	`gpt-4o-realtime-preview-2025-06-03`	128k	$5	$20

Features

Lambda AI
- New LLM API provider - PR #12817
Github Copilot
- Dynamic endpoint support - PR #12827
Morph
- New LLM API provider - PR #12821
Groq
- Remove deprecated groq/qwen-qwq-32b - PR #12832
Recraft
- New image generation API - PR #12832
- New image edits api - PR #12874
Azure OpenAI
- Support DefaultAzureCredential without hard-coded environment variables - PR #12841
Hyperbolic
- New LLM API provider - PR #12826
OpenAI
- /realtime API - pass through intent query param - PR #12838
Bedrock
- Add inpainting support for Amazon Nova Canvas - PR #12949 s/o @SantoshDhaladhuli

Bugs

Gemini (Google AI Studio + VertexAI)
- Fix leaking file descriptor error on sync calls - PR #12824
IBM Watsonx
- use correct parameter name for tool choice - PR #9980
Anthropic
- Only show ‘reasoning_effort’ for supported models - PR #12847
- Handle $id and $schema in tool call requests (Anthropic API stopped accepting them) - PR #12959
Openrouter
- filter out cache_control flag for non-anthropic models (allows usage with claude code) https://github.com/BerriAI/litellm/pull/12850
Gemini
- Shorten Gemini tool_call_id for Open AI compatibility - PR #12941 s/o @tonga54

LLM API Endpoints

Features

Passthrough endpoints
- Make key/user/team cost tracking OSS - PR #12847
/v1/models
- Return fallback models as part of api response - PR #12811 s/o @murad-khafizov
/vector_stores
- Make permission management OSS - PR #12990

Bugs

/batches
1. Skip invalid batch during cost tracking check (prev. Would stop all checks) - PR #12782
/chat/completions
1. Fix async retryer on .acompletion() - PR #12886

MCP Gateway

Features

Permission Management
- Make permission management by key/team OSS - PR #12988
MCP Alias
- Support mcp server aliases (useful for calling long mcp server names on Cursor) - PR #12994
Header Propagation
- Support propagating headers from client to backend MCP (useful for sending personal access tokens to backend MCP) - PR #13003

Management Endpoints / UI

Features

Usage
- Support viewing usage by model group - PR #12890
Virtual Keys
- New key_type field on /key/generate - allows specifying if key can call LLM API vs. Management routes - PR #12909
Models
- Add ‘auto router’ on UI - PR #12960
- Show global retry policy on UI - PR #12969
- Add model-level guardrails on create + update - PR #13006

Bugs

SSO
- Fix logout when SSO is enabled - PR #12703
- Fix reset SSO when ui_access_mode is updated - PR #13011
Guardrails
- Show correct guardrails when editing a team - PR #12823
Virtual Keys
- Get updated token on regenerate key - PR #12788
- Fix CVE with key injection - PR #12840

Logging / Guardrail Integrations

Features

Google Cloud Model Armor
- Document new guardrail - PR #12492
Pillar Security
- New LLM Guardrail - PR #12791
CloudZero
- Allow exporting spend to cloudzero - PR #12908
Model-level Guardrails
- Support model-level guardrails - PR #12968

Bugs

Prometheus
- Fix [tag]=false when tag is set for tag-based metrics - PR #12916
Guardrails AI
- Use ‘validatedOutput’ to allow usage of “fix” guards - PR #12891 s/o @DmitriyAlergant

Performance / Loadbalancing / Reliability improvements

Features

Auto-Router
- New auto-router powered by semantic-router - PR #12955

Bugs

forward_clientside_headers
- Filter out content-length from headers (caused backend requests to hang) - PR #12886
Message Redaction
- Fix cannot pickle coroutine object error - PR #13005

General Proxy Improvements

Features

Benchmarks
- Updated litellm proxy benchmarks (p50, p90, p99 overhead) - PR #12842
Request Headers
- Added new x-litellm-num-retries request header
Swagger
- Support local swagger on custom root paths - PR #12911
Health
- Track cost + add tags for health checks done by LiteLLM Proxy - PR #12880

Bugs

Proxy Startup
- Fixes issue on startup where team member budget is None would block startup - PR #12843
Docker
- Move non-root docker to chain guard image (fewer vulnerabilities) - PR #12707
- add azure-keyvault==4.2.0 to Docker img - PR #12873
Separate Health App
- Pass through cmd args via supervisord (enables user config to still work via docker) - PR #12871
Swagger
- Bump DOMPurify version (fixes vulnerability) - PR #12911
- Add back local swagger bundle (enables swagger to work in air gapped env.) - PR #12911
Request Headers
- Make ‘user_header_name’ field check case insensitive (fixes customer budget enforcement for OpenWebUi) - PR #12950
SpendLogs
- Fix issues writing to DB when custom_llm_provider is None - PR #13001

New Contributors

@magicalne made their first contribution in https://github.com/BerriAI/litellm/pull/12804
@pavangudiwada made their first contribution in https://github.com/BerriAI/litellm/pull/12798
@mdiloreto made their first contribution in https://github.com/BerriAI/litellm/pull/12707
@murad-khafizov made their first contribution in https://github.com/BerriAI/litellm/pull/12811
@eagle-p made their first contribution in https://github.com/BerriAI/litellm/pull/12791
@apoorv-sharma made their first contribution in https://github.com/BerriAI/litellm/pull/12920
@SantoshDhaladhuli made their first contribution in https://github.com/BerriAI/litellm/pull/12949
@tonga54 made their first contribution in https://github.com/BerriAI/litellm/pull/12941
@sings-to-bees-on-wednesdays made their first contribution in https://github.com/BerriAI/litellm/pull/12950

Full Changelog

v1.74.7-stable

July 19, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.74.7-stable.patch.1

pip install litellm
pip install litellm==1.74.7.post2

Key Highlights

Vector Stores - Support for Vertex RAG Engine, PG Vector, OpenAI & Azure OpenAI Vector Stores.
Bulk Editing Users - Bulk editing users on the UI.
Health Check Improvements - Prevent unnecessary pod restarts during high traffic.
New LLM Providers - Added Moonshot AI and Vercel v0 provider support.

Vector Stores API

This release introduces support for using VertexAI RAG Engine, PG Vector, Bedrock Knowledge Bases, and OpenAI Vector Stores with LiteLLM.

This is ideal for use cases requiring external knowledge sources with LLMs.

This brings the following benefits for LiteLLM users:

Proxy Admin Benefits:

Fine-grained access control: determine which Keys and Teams can access specific Vector Stores
Complete usage tracking and monitoring across all vector store operations

Developer Benefits:

Simple, unified interface for querying vector stores and using them with LLM API requests
Consistent API experience across all supported vector store providers

Get started

Bulk Editing Users

v1.74.7-stable introduces Bulk Editing Users on the UI. This is useful for:

granting all existing users to a default team (useful for controlling access / tracking spend by team)
controlling personal model access for existing users

Health Check Server

This release brings reliability improvements that prevent unnecessary pod restarts during high traffic. Previously, when the main LiteLLM app was busy serving traffic, health endpoints would timeout even when pods were healthy.

Starting with this release, you can run health endpoints on an isolated process with a dedicated port. This ensures liveness and readiness probes remain responsive even when the main LiteLLM app is under heavy load.

Pricing / Context Window Updates

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)
Azure AI	`azure_ai/grok-3`	131k	$3.30	$16.50
Azure AI	`azure_ai/global/grok-3`	131k	$3.00	$15.00
Azure AI	`azure_ai/global/grok-3-mini`	131k	$0.25	$1.27
Azure AI	`azure_ai/grok-3-mini`	131k	$0.275	$1.38
Azure AI	`azure_ai/jais-30b-chat`	8k	$3200	$9710
Groq	`groq/moonshotai-kimi-k2-instruct`	131k	$1.00	$3.00
AI21	`jamba-large-1.7`	256k	$2.00	$8.00
AI21	`jamba-mini-1.7`	256k	$0.20	$0.40
Together.ai	`together_ai/moonshotai/Kimi-K2-Instruct`	131k	$1.00	$3.00
v0	`v0/v0-1.0-md`	128k	$3.00	$15.00
v0	`v0/v0-1.5-md`	128k	$3.00	$15.00
v0	`v0/v0-1.5-lg`	512k	$15.00	$75.00
Moonshot	`moonshot/moonshot-v1-8k`	8k	$0.20	$2.00
Moonshot	`moonshot/moonshot-v1-32k`	32k	$1.00	$3.00
Moonshot	`moonshot/moonshot-v1-128k`	131k	$2.00	$5.00
Moonshot	`moonshot/moonshot-v1-auto`	131k	$2.00	$5.00
Moonshot	`moonshot/kimi-k2-0711-preview`	131k	$0.60	$2.50
Moonshot	`moonshot/moonshot-v1-32k-0430`	32k	$1.00	$3.00
Moonshot	`moonshot/moonshot-v1-128k-0430`	131k	$2.00	$5.00
Moonshot	`moonshot/moonshot-v1-8k-0430`	8k	$0.20	$2.00
Moonshot	`moonshot/kimi-latest`	131k	$2.00	$5.00
Moonshot	`moonshot/kimi-latest-8k`	8k	$0.20	$2.00
Moonshot	`moonshot/kimi-latest-32k`	32k	$1.00	$3.00
Moonshot	`moonshot/kimi-latest-128k`	131k	$2.00	$5.00
Moonshot	`moonshot/kimi-thinking-preview`	131k	$30.00	$30.00
Moonshot	`moonshot/moonshot-v1-8k-vision-preview`	8k	$0.20	$2.00
Moonshot	`moonshot/moonshot-v1-32k-vision-preview`	32k	$1.00	$3.00
Moonshot	`moonshot/moonshot-v1-128k-vision-preview`	131k	$2.00	$5.00

Features

🆕 Moonshot API (Kimi)
- New LLM API integration for accessing Kimi models - PR #12592, Get Started
🆕 v0 Provider
- New provider integration for v0.dev - PR #12751, Get Started
OpenAI
- Use OpenAI DeepResearch models with litellm.completion (/chat/completions) - PR #12627 DOC NEEDED
Azure OpenAI
- Use Azure OpenAI DeepResearch models with litellm.completion (/chat/completions) - PR #12627 DOC NEEDED
- Added response_format support for openai gpt-4.1 models - PR #12745
Anthropic
- Tool cache control support - PR #12668
Bedrock
- Claude 4 /invoke route support - PR #12599, Get Started
- Application inference profile tool choice support - PR #12599
Gemini
- Custom TTL support for context caching - PR #12541
- Fix implicit caching cost calculation for Gemini 2.x models - PR #12585
VertexAI
- Added Vertex AI RAG Engine support (use with OpenAI compatible /vector_stores API) - PR #12752, Get Started
vLLM
- Added support for using Rerank endpoints with vLLM - PR #12738, Get Started
AI21
- Added ai21/jamba-1.7 model family pricing - PR #12593, Get Started
Together.ai
- [New Model] add together_ai/moonshotai/Kimi-K2-Instruct - PR #12645, Get Started
Groq
- Add groq/moonshotai-kimi-k2-instruct model configuration - PR #12648, Get Started
Github Copilot
- Change System prompts to assistant prompts for GH Copilot - PR #12742, Get Started

Bugs

Anthropic
- Fix streaming + response_format + tools bug - PR #12463
XAI
- grok-4 does not support the stop param - PR #12646
AWS
- Role chaining with web authentication for AWS Bedrock - PR #12607
VertexAI
- Add project_id to cached credentials - PR #12661
Bedrock
- Fix bedrock nova micro and nova lite context window info in PR #12619

LLM API Endpoints

Features

/chat/completions
- Include tool calls in output of trim_messages - PR #11517
/v1/vector_stores
- New OpenAI-compatible vector store endpoints - PR #12699, Get Started
- Vector store search endpoint - PR #12749, Get Started
- Support for using PG Vector as a vector store - PR #12667, Get Started
/streamGenerateContent
- Non-gemini model support - PR #12647

Bugs

/vector_stores
- Knowledge Base Call returning error when passing as tools - PR #12628

MCP Gateway

Features

Access Groups
- Allow MCP access groups to be added via litellm proxy config.yaml - PR #12654
- List tools from access list for keys - PR #12657
Namespacing
- URL-based namespacing for better segregation - PR #12658
- Make MCP_TOOL_PREFIX_SEPARATOR configurable from env - PR #12603
Gateway Features
- Allow using MCPs with all LLM APIs (VertexAI, Gemini, Groq, etc.) when using /responses - PR #12546

Bugs

Fix to update object permission on update/delete key/team - PR #12701
Include /mcp in list of available routes on proxy - PR #12612

Management Endpoints / UI

Features

Keys
- Regenerate Key State Management improvements - PR #12729
Models
- Wildcard model filter support - PR #12597
- Fixes for handling team only models on UI - PR #12632
Usage Page
- Fix Y-axis labels overlap on Spend per Tag chart - PR #12754
Teams
- Allow setting custom key duration + show key creation stats - PR #12722
- Enable team admins to update member roles - PR #12629
Users
- New /user/bulk_update endpoint - PR #12720
Logs Page
- Add end_user filter on UI Logs Page - PR #12663
MCP Servers
- Copy MCP Server name functionality - PR #12760
Vector Stores
- UI support for clicking into Vector Stores - PR #12741
- Allow adding Vertex RAG Engine, OpenAI, Azure through UI - PR #12752
General
- Add Copy-on-Click for all IDs (Key, Team, Organization, MCP Server) - PR #12615
SCIM
- Add GET /ServiceProviderConfig endpoint - PR #12664

Bugs

Teams
- Ensure user id correctly added when creating new teams - PR #12719
- Fixes for handling team-only models on UI - PR #12632

Logging / Guardrail Integrations

Features

Google Cloud Model Armor
- New guardrails integration - PR #12492
Bedrock Guardrails
- Allow disabling exception on 'BLOCKED' action - PR #12693
Guardrails AI
- Support llmOutput based guardrails as pre-call hooks - PR #12674
DataDog LLM Observability
- Add support for tracking the correct span type based on LLM Endpoint used - PR #12652
Custom Logging
- Allow reading custom logger python scripts from S3 or GCS Bucket - PR #12623

Bugs

General Logging
- StandardLoggingPayload on cache_hits should track custom llm provider - PR #12652
S3 Buckets
- S3 v2 log uploader crashes when using with guardrails - PR #12733

Performance / Loadbalancing / Reliability improvements

Features

Health Checks
- Separate health app for liveness probes - PR #12669
- Health check app on separate port - PR #12718
Caching
- Add Azure Blob cache support - PR #12587
Router
- Handle ZeroDivisionError with zero completion tokens in lowest_latency strategy - PR #12734

Bugs

Database
- Use upsert for managed object table to avoid UniqueViolationError - PR #11795
- Refactor to support use_prisma_migrate for helm hook - PR #12600
Cache
- Fix: redis caching for embedding response models - PR #12750

Helm Chart

DB Migration Hook: refactor to support use_prisma_migrate - for helm hook PR
Add envVars and extraEnvVars support to Helm migrations job - PR #12591

General Proxy Improvements

Features

Control Plane + Data Plane Architecture
- Control Plane + Data Plane support - PR #12601
Proxy CLI
- Add "keys import" command to CLI - PR #12620
Swagger Documentation
- Add swagger docs for LiteLLM /chat/completions, /embeddings, /responses - PR #12618
Dependencies
- Loosen rich version from ==13.7.1 to >=13.7.1 - PR #12704

Bugs

Verbose log is enabled by default fix - PR #12596
Add support for disabling callbacks in request body - PR #12762
Handle circular references in spend tracking metadata JSON serialization - PR #12643

New Contributors

@AntonioKL made their first contribution in https://github.com/BerriAI/litellm/pull/12591
@marcelodiaz558 made their first contribution in https://github.com/BerriAI/litellm/pull/12541
@dmcaulay made their first contribution in https://github.com/BerriAI/litellm/pull/12463
@demoray made their first contribution in https://github.com/BerriAI/litellm/pull/12587
@staeiou made their first contribution in https://github.com/BerriAI/litellm/pull/12631
@stefanc-ai2 made their first contribution in https://github.com/BerriAI/litellm/pull/12622
@RichardoC made their first contribution in https://github.com/BerriAI/litellm/pull/12607
@yeahyung made their first contribution in https://github.com/BerriAI/litellm/pull/11795
@mnguyen96 made their first contribution in https://github.com/BerriAI/litellm/pull/12619
@rgambee made their first contribution in https://github.com/BerriAI/litellm/pull/11517
@jvanmelckebeke made their first contribution in https://github.com/BerriAI/litellm/pull/12725
@jlaurendi made their first contribution in https://github.com/BerriAI/litellm/pull/12704
@doublerr made their first contribution in https://github.com/BerriAI/litellm/pull/12661

Full Changelog

v1.74.3-stable

July 12, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.74.3-stable

pip install litellm
pip install litellm==1.74.3.post1

Key Highlights

MCP: Model Access Groups - Add mcp servers to access groups, for easily managing access to users and teams.
MCP: Tool Cost Tracking - Set prices for each MCP tool.
Model Hub v2 - New OSS Model Hub for telling developers what models are available on the proxy.
Bytez - New LLM API Provider.
Dashscope API - Call Alibaba's qwen models via new Dashscope API Provider.

MCP Gateway: Model Access Groups

v1.74.3-stable adds support for adding MCP servers to access groups, this makes it easier for Proxy Admins to manage access to MCP servers across users and teams.

For developers, this means you can now connect to multiple MCP servers by passing the access group name in the x-mcp-servers header.

Read more here

MCP Gateway: Tool Cost Tracking

This release adds cost tracking for MCP tool calls. This is great for Proxy Admins giving MCP access to developers as you can now attribute MCP tool call costs to specific LiteLLM keys and teams.

You can set:

Uniform server cost: Set a uniform cost for all tools from a server
Individual tool cost: Define individual costs for specific tools (e.g., search_tool costs $10, get_weather costs $5).
Dynamic costs: For use cases where you want to set costs based on the MCP's response, you can write a custom post mcp call hook to parse responses and set costs dynamically.

Get started

Model Hub v2

v1.74.3-stable introduces a new OSS Model Hub for telling developers what models are available on the proxy.

This is great for Proxy Admins as you can now tell developers what models are available on the proxy.

This improves on the previous model hub by enabling:

The ability to show Developers models, even if they don't have a LiteLLM key.
The ability for Proxy Admins to select specific models to be public on the model hub.
Improved search and filtering capabilities:
- search for models by partial name (e.g. xai grok-4)
- filter by provider and feature (e.g. 'vision' models)
- sort by cost (e.g. cheapest vision model from OpenAI)

Get started

New Models / Updated Models

Pricing / Context Window Updates

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)	Type
Xai	`xai/grok-4`	256k	$3.00	$15.00	New
Xai	`xai/grok-4-0709`	256k	$3.00	$15.00	New
Xai	`xai/grok-4-latest`	256k	$3.00	$15.00	New
Mistral	`mistral/devstral-small-2507`	128k	$0.1	$0.3	New
Mistral	`mistral/devstral-medium-2507`	128k	$0.4	$2	New
Azure OpenAI	`azure/o3-deep-research`	200k	$10	$40	New

Features

Xinference
- Image generation API support - PR
Bedrock
- API Key Auth support for AWS Bedrock API - PR
🆕 Dashscope
- New integration from Alibaba (enables qwen usage) - PR
🆕 Bytez
- New /chat/completion integration - PR

Bugs

Github Copilot
- Fix API base url for Github Copilot - PR
Bedrock
- Ensure supported bedrock/converse/ params = bedrock/ params - PR
- Fix cache token cost calculation - PR
XAI
- ensure finish_reason includes tool calls when xai responses with tool calls - PR

LLM API Endpoints

Features

/completions
- Return ‘reasoning_content’ on streaming - PR
/chat/completions
- Add 'thinking blocks' to stream chunk builder - PR
/v1/messages
- Fallbacks support - PR
- tool call handling for non-anthropic models (/v1/messages to /chat/completion bridge) - PR

MCP Gateway

Features

Cost Tracking
- Add Cost Tracking - PR
- Add usage tracking - PR
- Add custom cost configuration for each MCP tool - PR
- Add support for editing MCP cost per tool - PR
- Allow using custom post call MCP hook for cost tracking - PR
Auth
- Allow customizing what client side auth header to use - PR
- Raises error when MCP server header is malformed in the request - PR
MCP Server
- Allow using stdio MCPs with LiteLLM (enables using Circle CI MCP w/ LiteLLM) - PR, Get Started

Bugs

General
- Fix task group is not initialized error - PR s/o @juancarlosm
MCP Server
- Fix mcp tool separator to work with Claude code - PR, Get Started
- Add validation to mcp server name to not allow "-" (enables namespaces to work) - PR

Management Endpoints / UI

Features

Model Hub
- new model hub table view - PR
- new /public/model_hub endpoint - PR
- Make Model Hub OSS - PR
- New ‘make public’ modal flow for showing proxy models on public model hub - PR
MCP
- support for internal users to use and manage MCP servers - PR
- Adds UI support to add MCP access groups (similar to namespaces) - PR
- MCP Tool Testing Playground - PR
- Show cost config on root of MCP settings - PR
Test Key
- Stick sessions - PR
- MCP Access Groups - allow mcp access groups - PR
Usage
- Truncate long labels and improve tooltip in Top API Keys chart - PR
- Improve Chart Readability for Tag Usage - PR
Teams
- Prevent navigation reset after team member operations - PR
- Team Members - reset budget, if duration set - PR
- Use central team member budget when max_budget_in_team set on UI - PR
SSO
- Allow users to run a custom sso login handler - PR
Navbar
- improve user dropdown UI with premium badge and cleaner layout - PR
General
- Consistent layout for Create and Back buttons on all the pages - PR
- Align Show Password with Checkbox - PR
- Prevent writing default user setting updates to yaml (causes error in non-root env) - PR

Bugs

Model Hub
- fix duplicates in /model_group/info - PR
MCP
- Fix UI not syncing MCP access groups properly with object permissions - PR

Logging / Guardrail Integrations

Features

Langfuse
- Version bump - PR
- LANGFUSE_TRACING_ENVIRONMENT support - PR
Bedrock Guardrails
- Raise Bedrock output text on 'BLOCKED' actions from guardrail - PR
OTEL
- OTEL_RESOURCE_ATTRIBUTES support - PR
Guardrails AI
- pre-call + logging only guardrail (pii detection/competitor names) support - PR
Guardrails
- [Enterprise] Support tag based mode for guardrails - PR, Get Started
OpenAI Moderations API
- New guardrail integration - PR
Prometheus
- support tag based metrics (enables prometheus metrics for measuring roo-code/cline/claude code engagement) - PR, Get Started
Datadog LLM Observability
- Added total_cost field to track costs in DataDog LLM observability metrics - PR

Bugs

Prometheus
- Remove experimental _by_tag metrics (fixes cardinality issue) - PR
Slack Alerting
- Fix slack alerting for outage and region outage alerts - PR, Get Started

Performance / Loadbalancing / Reliability improvements

Bugs

Responses API Bridge
- add image support for Responses API when falling back on Chat Completions - PR s/o @ryan-castner
aiohttp
- Properly close aiohttp client sessions to prevent resource leaks - PR
Router
- don't add invalid deployment to router pattern match - PR

General Proxy Improvements

Bugs

S3
- s3 config.yaml file - ensure yaml safe load is used - PR
Audit Logs
- Add audit logs for model updates - PR
Startup
- Multiple API Keys Created on Startup when max_budget is enabled - PR
Auth
- Resolve model group alias on Auth (if user has access to underlying model, allow alias request to work) - PR
config.yaml
- fix parsing environment_variables from config.yaml - PR
Security
- Log hashed jwt w/ prefix instead of actual value - PR

Features

MCP
- Bump mcp version on docker img - PR
Request Headers
- Forward ‘anthropic-beta’ header when forward_client_headers_to_llm_api is true - PR

New Contributors

@kanaka made their first contribution in https://github.com/BerriAI/litellm/pull/12418
@juancarlosm made their first contribution in https://github.com/BerriAI/litellm/pull/12411
@DmitriyAlergant made their first contribution in https://github.com/BerriAI/litellm/pull/12356
@Rayshard made their first contribution in https://github.com/BerriAI/litellm/pull/12487
@minghao51 made their first contribution in https://github.com/BerriAI/litellm/pull/12361
@jdietzsch91 made their first contribution in https://github.com/BerriAI/litellm/pull/12488
@iwinux made their first contribution in https://github.com/BerriAI/litellm/pull/12473
@andresC98 made their first contribution in https://github.com/BerriAI/litellm/pull/12413
@EmaSuriano made their first contribution in https://github.com/BerriAI/litellm/pull/12509
@strawgate made their first contribution in https://github.com/BerriAI/litellm/pull/12528
@inf3rnus made their first contribution in https://github.com/BerriAI/litellm/pull/12121

Git Diff

v1.74.0-stable

July 5, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.74.0-stable

pip install litellm
pip install litellm==1.74.0.post2

Key Highlights

MCP Gateway Namespace Servers - Clients connecting to LiteLLM can now specify which MCP servers to use.
Key/Team Based Logging on UI - Proxy Admins can configure team or key-based logging settings directly in the UI.
Azure Content Safety Guardrails - Added support for prompt injection and text moderation with Azure Content Safety Guardrails.
VertexAI Deepseek Models - Support for calling VertexAI Deepseek models with LiteLLM's/chat/completions or /responses API.
Github Copilot API - You can now use Github Copilot as an LLM API provider.

MCP Gateway: Namespaced MCP Servers

This release brings support for namespacing MCP Servers on LiteLLM MCP Gateway. This means you can specify the x-mcp-servers header to specify which servers to list tools from.

This is useful when you want to point MCP clients to specific MCP Servers on LiteLLM.

Usage

OpenAI API
LiteLLM Proxy
Cursor IDE

cURL Example with Server Segregation
curl --location 'https://api.openai.com/v1/responses' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer $OPENAI_API_KEY" \
--data '{
    "model": "gpt-4o",
    "tools": [
        {
            "type": "mcp",
            "server_label": "litellm",
            "server_url": "<your-litellm-proxy-base-url>/mcp",
            "require_approval": "never",
            "headers": {
                "x-litellm-api-key": "Bearer YOUR_LITELLM_API_KEY",
                "x-mcp-servers": "Zapier_Gmail"
            }
        }
    ],
    "input": "Run available tools",
    "tool_choice": "required"
}'

In this example, the request will only have access to tools from the "Zapier_Gmail" MCP server.

cURL Example with Server Segregation
curl --location '<your-litellm-proxy-base-url>/v1/responses' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer $LITELLM_API_KEY" \
--data '{
    "model": "gpt-4o",
    "tools": [
        {
            "type": "mcp",
            "server_label": "litellm",
            "server_url": "<your-litellm-proxy-base-url>/mcp",
            "require_approval": "never",
            "headers": {
                "x-litellm-api-key": "Bearer YOUR_LITELLM_API_KEY",
                "x-mcp-servers": "Zapier_Gmail,Server2"
            }
        }
    ],
    "input": "Run available tools",
    "tool_choice": "required"
}'

This configuration restricts the request to only use tools from the specified MCP servers.

Cursor MCP Configuration with Server Segregation
{
  "mcpServers": {
    "LiteLLM": {
      "url": "<your-litellm-proxy-base-url>/mcp",
      "headers": {
        "x-litellm-api-key": "Bearer $LITELLM_API_KEY",
        "x-mcp-servers": "Zapier_Gmail,Server2"
      }
    }
  }
}

This configuration in Cursor IDE settings will limit tool access to only the specified MCP server.

Team / Key Based Logging on UI

This release brings support for Proxy Admins to configure Team/Key Based Logging Settings on the UI. This allows routing LLM request/response logs to different Langfuse/Arize projects based on the team or key.

For developers using LiteLLM, their logs are automatically routed to their specific Arize/Langfuse projects. On this release, we support the following integrations for key/team based logging:

langfuse
arize
langsmith

Azure Content Safety Guardrails

LiteLLM now supports Azure Content Safety Guardrails for Prompt Injection and Text Moderation. This is great for internal chat-ui use cases, as you can now create guardrails with detection for Azure’s Harm Categories, specify custom severity thresholds and run them across 100+ LLMs for just that use-case (or across all your calls).

Get Started

Python SDK: 2.3 Second Faster Import Times

This release brings significant performance improvements to the Python SDK with 2.3 seconds faster import times. We've refactored the initialization process to reduce startup overhead, making LiteLLM more efficient for applications that need quick initialization. This is a major improvement for applications that need to initialize LiteLLM quickly.

New Models / Updated Models

Pricing / Context Window Updates

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)	Type
Watsonx	`watsonx/mistralai/mistral-large`	131k	$3.00	$10.00	New
Azure AI	`azure_ai/cohere-rerank-v3.5`	4k	$2.00/1k queries	-	New (Rerank)

Features

🆕 GitHub Copilot - Use GitHub Copilot API with LiteLLM - PR, Get Started
🆕 VertexAI DeepSeek - Add support for VertexAI DeepSeek models - PR, Get Started
Azure AI
- Add azure_ai cohere rerank v3.5 - PR, Get Started
Vertex AI
- Add size parameter support for image generation - PR, Get Started
Custom LLM
- Pass through extra_ properties on "custom" llm provider - PR

Bugs

Mistral
- Fix transform_response handling for empty string content - PR
- Turn Mistral to use llm_http_handler - PR
Gemini
- Fix tool call sequence - PR
- Fix custom api_base path preservation - PR
Anthropic
- Fix user_id validation logic - PR
Bedrock
- Support optional args for bedrock - PR
Ollama
- Fix default parameters for ollama-chat - PR
VLLM
- Add 'audio_url' message type support - PR

LLM API Endpoints

Features

/batches
- Support batch retrieve with target model Query Param - PR
- Anthropic completion bridge improvements - PR
/responses
- Azure responses api bridge improvements - PR
- Fix responses api error handling - PR
/mcp (MCP Gateway)
- Add MCP url masking on frontend - PR
- Add MCP servers header to scope - PR
- Litellm mcp tool prefix - PR
- Segregate MCP tools on connections using headers - PR
- Added changes to mcp url wrapping - PR

Bugs

/v1/messages
- Remove hardcoded model name on streaming - PR
- Support lowest latency routing - PR
- Non-anthropic models token usage returned - PR
/chat/completions
- Support Cursor IDE tool_choice format {"type": "auto"} - PR
/generateContent
- Allow passing litellm_params - PR
- Only pass supported params when using OpenAI models - PR
- Fix using gemini-cli with Vertex Anthropic Models - PR
Streaming
- Fix Error code: 307 for LlamaAPI Streaming Chat - PR
- Store finish reason even if is_finished - PR

Spend Tracking / Budget Improvements

Bugs

Fix allow strings in calculate cost - PR
VertexAI Anthropic streaming cost tracking with prompt caching fixes - PR

Management Endpoints / UI

Bugs

Team Management
- Prevent team model reset on model add - PR
- Return team-only models on /v2/model/info - PR
- Render team member budget correctly - PR
UI Rendering
- Fix rendering ui on non-root images - PR
- Correctly display 'Internal Viewer' user role - PR
Configuration
- Handle empty config.yaml - PR
- Fix gemini /models - replace models/ as expected - PR

Features

Team Management
- Allow adding team specific logging callbacks - PR
- Add Arize Team Based Logging - PR
- Allow Viewing/Editing Team Based Callbacks - PR
UI Improvements
- Comma separated spend and budget display - PR
- Add logos to callback list - PR
CLI
- Add litellm-proxy cli login for starting to use litellm proxy - PR
Email Templates
- Customizable Email template - Subject and Signature - PR

Logging / Guardrail Integrations

Features

Guardrails
- All guardrails are now supported on the UI - PR
Azure Content Safety
- Add Azure Content Safety Guardrails to LiteLLM proxy - PR
- Add azure content safety guardrails to the UI - PR
DeepEval
- Fix DeepEval logging format for failure events - PR
Arize
- Add Arize Team Based Logging - PR
Langfuse
- Langfuse prompt_version support - PR
Sentry Integration
- Add sentry scrubbing - PR
AWS SQS Logging
- New AWS SQS Logging Integration - PR
S3 Logger
- Add failure logging support - PR
Prometheus Metrics
- Add better error validation for prometheus metrics and labels - PR

Bugs

Security
- Ensure only LLM API route fails get logged on Langfuse - PR
OpenMeter
- Integration error handling fix - PR
Message Redaction
- Ensure message redaction works for responses API logging - PR
Bedrock Guardrails
- Fix bedrock guardrails post_call for streaming responses - PR

Performance / Loadbalancing / Reliability improvements

Features

Python SDK
- 2 second faster import times - PR
- Reduce python sdk import time by .3s - PR
Error Handling
- Add error handling for MCP tools not found or invalid server - PR
SSL/TLS
- Fix SSL certificate error - PR
- Fix custom ca bundle support in aiohttp transport - PR

General Proxy Improvements

Startup
- Add new banner on startup - PR
Dependencies
- Update pydantic version - PR

New Contributors

@wildcard made their first contribution in https://github.com/BerriAI/litellm/pull/12157
@colesmcintosh made their first contribution in https://github.com/BerriAI/litellm/pull/12168
@seyeong-han made their first contribution in https://github.com/BerriAI/litellm/pull/11946
@dinggh made their first contribution in https://github.com/BerriAI/litellm/pull/12162
@raz-alon made their first contribution in https://github.com/BerriAI/litellm/pull/11432
@tofarr made their first contribution in https://github.com/BerriAI/litellm/pull/12200
@szafranek made their first contribution in https://github.com/BerriAI/litellm/pull/12179
@SamBoyd made their first contribution in https://github.com/BerriAI/litellm/pull/12147
@lizzij made their first contribution in https://github.com/BerriAI/litellm/pull/12219
@cipri-tom made their first contribution in https://github.com/BerriAI/litellm/pull/12201
@zsimjee made their first contribution in https://github.com/BerriAI/litellm/pull/12185
@jroberts2600 made their first contribution in https://github.com/BerriAI/litellm/pull/12175
@njbrake made their first contribution in https://github.com/BerriAI/litellm/pull/12202
@NANDINI-star made their first contribution in https://github.com/BerriAI/litellm/pull/12244
@utsumi-fj made their first contribution in https://github.com/BerriAI/litellm/pull/12230
@dcieslak19973 made their first contribution in https://github.com/BerriAI/litellm/pull/12283
@hanouticelina made their first contribution in https://github.com/BerriAI/litellm/pull/12286
@lowjiansheng made their first contribution in https://github.com/BerriAI/litellm/pull/11999
@JoostvDoorn made their first contribution in https://github.com/BerriAI/litellm/pull/12281
@takashiishida made their first contribution in https://github.com/BerriAI/litellm/pull/12239

Git Diff

v1.73.6-stable

June 28, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.73.6-stable.patch.1

pip install litellm
pip install litellm==1.73.6.post1

Key Highlights

Claude on gemini-cli

This release brings support for using gemini-cli with LiteLLM.

You can use claude-sonnet-4, gemini-2.5-flash (Vertex AI & Google AI Studio), gpt-4.1 and any LiteLLM supported model on gemini-cli.

When you use gemini-cli with LiteLLM you get the following benefits:

Developer Benefits:

Universal Model Access: Use any LiteLLM supported model (Anthropic, OpenAI, Vertex AI, Bedrock, etc.) through the gemini-cli interface.
Higher Rate Limits & Reliability: Load balance across multiple models and providers to avoid hitting individual provider limits, with fallbacks to ensure you get responses even if one provider fails.

Proxy Admin Benefits:

Centralized Management: Control access to all models through a single LiteLLM proxy instance without giving your developers API Keys to each provider.
Budget Controls: Set spending limits and track costs across all gemini-cli usage.

Get Started

Batch API Cost Tracking

v1.73.6 brings cost tracking for LiteLLM Managed Batch API calls to LiteLLM. Previously, this was not being done for Batch API calls using LiteLLM Managed Files. Now, LiteLLM will store the status of each batch call in the DB and poll incomplete batch jobs in the background, emitting a spend log for cost tracking once the batch is complete.

There is no new flag / change needed on your end. Over the next few weeks we hope to extend this to cover batch cost tracking for the Anthropic passthrough as well.

Get Started

New Models / Updated Models

Pricing / Context Window Updates

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)	Type
Azure OpenAI	`azure/o3-pro`	200k	$20.00	$80.00	New
OpenRouter	`openrouter/mistralai/mistral-small-3.2-24b-instruct`	32k	$0.1	$0.3	New
OpenAI	`o3-deep-research`	200k	$10.00	$40.00	New
OpenAI	`o3-deep-research-2025-06-26`	200k	$10.00	$40.00	New
OpenAI	`o4-mini-deep-research`	200k	$2.00	$8.00	New
OpenAI	`o4-mini-deep-research-2025-06-26`	200k	$2.00	$8.00	New
Deepseek	`deepseek-r1`	65k	$0.55	$2.19	New
Deepseek	`deepseek-v3`	65k	$0.27	$0.07	New

Updated Models

Bugs

Sambanova
- Handle float timestamps - PR s/o @neubig
Azure
- support Azure Authentication method (azure ad token, api keys) on Responses API - PR s/o @hsuyuming
- Map ‘image_url’ str as nested dict - PR s/o @davis-featherstone
Watsonx
- Set ‘model’ field to None when model is part of a custom deployment - fixes error raised by WatsonX in those cases - PR s/o @cbjuan
Perplexity
- Support web_search_options - PR
- Support citation token and search queries cost calculation - PR
Anthropic
- Null value in usage block handling - PR
Gemini (Google AI Studio + VertexAI)
- Only use accepted format values (enum and datetime) - else gemini raises errors - PR
- Cache tools if passed alongside cached content (else gemini raises an error) - PR
- Json schema translation improvement: Fix unpack_def handling of nested $ref inside anyof items - PR
Mistral
- Fix thinking prompt to match hugging face recommendation - PR
- Add supports_response_schema: true for all mistral models except codestral-mamba - PR
Ollama
- Fix unnecessary await on embedding calls - PR

Features

Azure OpenAI
- Check if o-series model supports reasoning effort (enables drop_params to work for o1 models)
- Assistant + tool use cost tracking - PR
Nvidia Nim
- Add ‘response_format’ param support - PR @shagunb-acn
ElevenLabs
- New STT provider - PR

LLM API Endpoints

Features

/mcp
- Send appropriate auth string value to /tool/call endpoint with x-mcp-auth - PR s/o @wagnerjt
/v1/messages
- Custom LLM support - PR
/chat/completions
- Azure Responses API via chat completion support - PR
/responses
- Add reasoning content support for non-openai providers - PR
[NEW] /generateContent
- New endpoints for gemini cli support - PR
- Support calling Google AI Studio / VertexAI Gemini models in their native format - PR
- Add logging + cost tracking for stream + non-stream vertex/google ai studio routes - PR
- Add Bridge from generateContent to /chat/completions - PR
/batches
- Filter deployments to only those where managed file was written to - PR
- Save all model / file id mappings in db (previously it was just the first one) - enables ‘true’ loadbalancing - PR
- Support List Batches with target model name specified - PR

Spend Tracking / Budget Improvements

Features

Passthrough
- Bedrock - cost tracking (/invoke + /converse routes) on streaming + non-streaming - PR
- VertexAI - anthropic cost calculation support - PR
Batches
- Background job for cost tracking LiteLLM Managed batches - PR

Management Endpoints / UI

Bugs

General UI
- Fix today selector date mutation in dashboard components - PR
Usage
- Aggregate usage data across all pages of paginated endpoint - PR
Teams
- De-duplicate models in team settings dropdown - PR
Models
- Preserve public model name when selecting ‘test connect’ with azure model (previously would reset) - PR
Invitation Links
- Ensure Invite links email contain the correct invite id when using tf provider - PR

Features

Models
- Add ‘last success’ column to health check table - PR
MCP
- New UI component to support auth types: api key, bearer token, basic auth - PR s/o @wagnerjt
- Ensure internal users can access /mcp and /mcp/ routes - PR
SCIM
- Ensure default_internal_user_params are applied for new users - PR
Team
- Support default key expiry for team member keys - PR
- Expand team member add check to cover user email - PR
UI
- Restrict UI access by SSO group - PR
Keys
- Add new new_key param for regenerating key - PR
Test Keys
- New ‘get code’ button for getting runnable python code snippet based on ui configuration - PR

Logging / Guardrail Integrations

Bugs

Braintrust
- Adds model to metadata to enable braintrust cost estimation - PR

Features

Callbacks
- (Enterprise) - disable logging callbacks in request headers - PR
- Add List Callbacks API Endpoint - PR
Bedrock Guardrail
- Don't raise exception on intervene action - PR
- Ensure PII Masking is applied on response streaming or non streaming content when using post call - PR
[NEW] Palo Alto Networks Prisma AIRS Guardrail
- PR
ElasticSearch
- New Elasticsearch Logging Tutorial - PR
Message Redaction
- Preserve usage / model information for Embedding redaction - PR

Performance / Loadbalancing / Reliability improvements

Bugs

Team-only models
- Filter team-only models from routing logic for non-team calls
Context Window Exceeded error
- Catch anthropic exceptions - PR

Features

Router
- allow using dynamic cooldown time for a specific deployment - PR
- handle cooldown_time = 0 for deployments - PR
Redis
- Add better debugging to see what variables are set - PR

General Proxy Improvements

Bugs

aiohttp
- Check HTTP_PROXY vars in networking requests
- Allow using HTTP_ Proxy settings with trust_env

Features

Docs
- Add recommended spec - PR
Swagger
- Introduce new environment variable NO_REDOC to opt-out Redoc - PR

New Contributors

@mukesh-dream11 made their first contribution in https://github.com/BerriAI/litellm/pull/11969
@cbjuan made their first contribution in https://github.com/BerriAI/litellm/pull/11854
@ryan-castner made their first contribution in https://github.com/BerriAI/litellm/pull/12055
@davis-featherstone made their first contribution in https://github.com/BerriAI/litellm/pull/12075
@Gum-Joe made their first contribution in https://github.com/BerriAI/litellm/pull/12068
@jroberts2600 made their first contribution in https://github.com/BerriAI/litellm/pull/12116
@ohmeow made their first contribution in https://github.com/BerriAI/litellm/pull/12022
@amarrella made their first contribution in https://github.com/BerriAI/litellm/pull/11942
@zhangyoufu made their first contribution in https://github.com/BerriAI/litellm/pull/12092
@bougou made their first contribution in https://github.com/BerriAI/litellm/pull/12088
@codeugar made their first contribution in https://github.com/BerriAI/litellm/pull/11972
@glgh made their first contribution in https://github.com/BerriAI/litellm/pull/12133

Git Diff

v1.73.0-stable - Set default team for new users

June 21, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

warning

Known Issues

The non-root docker image has a known issue around the UI not loading. If you use the non-root docker image we recommend waiting before upgrading to this version. We will post a patch fix for this.

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.73.0-stable

pip install litellm
pip install litellm==1.73.0.post1

TLDR

Why Upgrade
- User Management: Set default team for new users - enables giving all users $10 API keys for exploration.
- Passthrough Endpoints v2: Enhanced support for subroutes and custom cost tracking for passthrough endpoints.
- Health Check Dashboard: New frontend UI for monitoring model health and status.
Who Should Read
- Teams using Passthrough Endpoints
- Teams using User Management on LiteLLM
- Teams using Health Check Dashboard for models
- Teams using Claude Code with LiteLLM
Risk of Upgrade
- Low
  - No major breaking changes to existing functionality.

Major Changes
- User Agent will be auto-tracked as a tag in LiteLLM UI Logs Page. This means for all LLM requests you will see a User Agent tag in the logs page.

Key Highlights

Set Default Team for New Users

v1.73.0 introduces the ability to assign new users to Default Teams. This makes it much easier to enable experimentation with LLMs within your company, while also ensuring spend for exploration is tracked correctly.

What this means for Proxy Admins:

Set a max budget per team member: This sets a max amount an individual can spend within a team.
Set a default team for new users: When a new user signs in via SSO / invitation link, they will be automatically added to this team.

What this means for Developers:

View models across teams: You can now go to Models + Endpoints and view the models you have access to, across all teams you're a member of.
Safe create key modal: If you have no model access outside of a team (default behaviour), you are now nudged to select a team on the Create Key modal. This resolves a common confusion point for new users onboarding to the proxy.

Get Started

Passthrough Endpoints v2

This release brings support for adding billing and full URL forwarding for passthrough endpoints.

Previously, you could only map simple endpoints, but now you can add just /bria and all subroutes automatically get forwarded - for example, /bria/v1/text-to-image/base/model and /bria/v1/enhance_image will both be forwarded to the target URL with the same path structure.

This means you as Proxy Admin can onboard third-party endpoints like Bria API and Mistral OCR, set a cost per request, and give your developers access to the complete API functionality.

Learn more about Passthrough Endpoints

v2 Health Checks

This release brings support for Proxy Admins to select which specific models to health check and see the health status as soon as its individual check completes, along with last check times.

This allows Proxy Admins to immediately identify which specific models are in a bad state and view the full error stack trace for faster troubleshooting.

New / Updated Models

Pricing / Context Window Updates

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)	Type
Google VertexAI	`vertex_ai/imagen-4`	N/A	Image Generation	Image Generation	New
Google VertexAI	`vertex_ai/imagen-4-preview`	N/A	Image Generation	Image Generation	New
Gemini	`gemini-2.5-pro`	2M	$1.25	$5.00	New
Gemini	`gemini-2.5-flash-lite`	1M	$0.075	$0.30	New
OpenRouter	Various models	Updated	Updated	Updated	Updated
Azure	`azure/o3`	200k	$2.00	$8.00	Updated
Azure	`azure/o3-pro`	200k	$2.00	$8.00	Updated
Azure OpenAI	Azure Codex Models	Various	Various	Various	New

Updated Models

Features

Azure
- Support for new /v1 preview Azure OpenAI API - PR, Get Started
- Add Azure Codex Models support - PR, Get Started
- Make Azure AD scope configurable - PR
- Handle more GPT custom naming patterns - PR
- Update o3 pricing to match OpenAI pricing - PR
VertexAI
- Add Vertex Imagen-4 models - PR, Get Started
- Anthropic streaming passthrough cost tracking - PR
Gemini
- Working Gemini TTS support via /v1/speech endpoint - PR
- Fix gemini 2.5 flash config - PR
- Add missing flash-2.5-flash-lite model and fix pricing - PR
- Mark all gemini-2.5 models as supporting PDF input - PR
- Add gemini-2.5-pro with reasoning support - PR
AWS Bedrock
- AWS credentials no longer mandatory - PR
- Add AWS Bedrock profiles for APAC region - PR
- Fix AWS Bedrock Claude tool call index - PR
- Handle base64 file data with qs:.. prefix - PR
- Add Mistral Small to BEDROCK_CONVERSE_MODELS - PR
Mistral
- Enhance Mistral API with parallel tool calls support - PR
Meta Llama API
- Enable tool calling for meta_llama models - PR
Volcengine
- Add thinking parameter support - PR

Bugs

VertexAI
- Handle missing tokenCount in promptTokensDetails - PR
- Fix vertex AI claude thinking params - PR
Gemini
- Fix web search error with responses API - PR, Get Started
Custom LLM
- Set anthropic custom LLM provider property - PR
Anthropic
- Bump anthropic package version - PR
Ollama
- Update ollama_embeddings to work on sync API - PR
- Fix response_format not working - PR

LLM API Endpoints

Features

Responses API
- Day-0 support for OpenAI re-usable prompts Responses API - PR, Get Started
- Support passing image URLs in Completion-to-Responses bridge - PR
MCP Gateway
- Add Allowed MCPs to Creating/Editing Organizations - PR, Get Started
- Allow connecting to MCP with authentication headers - PR, Get Started
Speech API
- Working Gemini TTS support via OpenAI's /v1/speech endpoint - PR
Passthrough Endpoints
- Add support for subroutes for passthrough endpoints - PR
- Support for setting custom cost per passthrough request - PR
- Ensure "Request" is tracked for passthrough requests on LiteLLM Proxy - PR
- Add V2 Passthrough endpoints on UI - PR
- Move passthrough endpoints under Models + Endpoints in UI - PR
- QA improvements for adding passthrough endpoints - PR, PR
Models API
- Allow /models to return correct models for custom wildcard prefixes - PR

Bugs

Messages API
- Fix /v1/messages endpoint always using us-central1 with vertex_ai-anthropic models - PR
- Fix model_group tracking for /v1/messages and /moderations - PR
- Fix cost tracking and logging via /v1/messages API when using Claude Code - PR
MCP Gateway
- Fix using MCPs defined on config.yaml - PR
Chat Completion API
- Allow dict for tool_choice argument in acompletion - PR
Passthrough Endpoints
- Don't log request to Langfuse passthrough on Langfuse - PR

Spend Tracking

Features

User Agent Tracking
- Automatically track spend by user agent (allows cost tracking for Claude Code) - PR
- Add user agent tags in spend logs payload - PR
Tag Management
- Support adding public model names in tag management - PR

Management Endpoints / UI

Features

Test Key Page
- Allow testing /v1/messages on the Test Key Page - PR
SSO
- Allow passing additional headers - PR
JWT Auth
- Correctly return user email - PR
Model Management
- Allow editing model access group for existing model - PR
Team Management
- Allow setting default team for new users - PR, PR
- Fix default team settings - PR
SCIM
- Add error handling for existing user on SCIM - PR
- Add SCIM PATCH and PUT operations for users - PR
Health Check Dashboard
- Implement health check backend API and storage functionality - PR
- Add LiteLLM_HealthCheckTable to database schema - PR
- Implement health check frontend UI components and dashboard integration - PR
- Add success modal for health check responses - PR
- Fix clickable model ID in health check table - PR
- Fix health check UI table design - PR

Logging / Guardrails Integrations

Bugs

Prometheus
- Fix bug for using prometheus metrics config - PR

Security & Reliability

Security Fixes

Documentation Security
- Security fixes for docs - PR
- Add Trivy Security Scan for UI + Docs folder - remove all vulnerabilities - PR

Reliability Improvements

Dependencies
- Fix aiohttp version requirement - PR
- Bump next from 14.2.26 to 14.2.30 in UI dashboard - PR
Networking
- Allow using CA Bundles - PR
- Add workload identity federation between GCP and AWS - PR

General Proxy Improvements

Features

Deployment
- Add deployment annotations for Kubernetes - PR
- Add ciphers in command and pass to hypercorn for proxy - PR
Custom Root Path
- Fix loading UI on custom root path - PR
SDK Improvements
- LiteLLM SDK / Proxy improvement (don't transform message client-side) - PR

Bugs

Observability
- Fix boto3 tracer wrapping for observability - PR

New Contributors

@kjoth made their first contribution in PR
@shagunb-acn made their first contribution in PR
@MadsRC made their first contribution in PR
@Abiji-2020 made their first contribution in PR
@salzubi401 made their first contribution in PR
@orolega made their first contribution in PR
@X4tar made their first contribution in PR
@karen-veigas made their first contribution in PR
@Shankyg made their first contribution in PR
@pascallim made their first contribution in PR
@lgruen-vcgs made their first contribution in PR
@rinormaloku made their first contribution in PR
@InvisibleMan1306 made their first contribution in PR
@ervwalter made their first contribution in PR
@ThakeeNathees made their first contribution in PR
@jnhyperion made their first contribution in PR
@Jannchie made their first contribution in PR

Demo Instance

Here's a Demo Instance to test changes:

Instance: https://demo.litellm.ai/
Login Credentials:
- Username: admin
- Password: sk-1234

Git Diff

v1.72.6-stable - MCP Gateway Permission Management

June 14, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run
-e STORE_MODEL_IN_DB=True
-p 4000:4000
ghcr.io/berriai/litellm:main-v1.72.6-stable

pip install litellm
pip install litellm==1.72.6.post2

TLDR

Why Upgrade
- Codex-mini on Claude Code: You can now use codex-mini (OpenAI’s code assistant model) via Claude Code.
- MCP Permissions Management: Manage permissions for MCP Servers by Keys, Teams, Organizations (entities) on LiteLLM.
- UI: Turn on/off auto refresh on logs view.
- Rate Limiting: Support for output token-only rate limiting.
Who Should Read
- Teams using /v1/messages API (Claude Code)
- Teams using MCP
- Teams giving access to self-hosted models and setting rate limits
Risk of Upgrade
- Low
  - No major changes to existing functionality or package updates.

Key Highlights

MCP Permissions Management

This release brings support for managing permissions for MCP Servers by Keys, Teams, Organizations (entities) on LiteLLM. When a MCP client attempts to list tools, LiteLLM will only return the tools the entity has permissions to access.

This is great for use cases that require access to restricted data (e.g Jira MCP) that you don't want everyone to use.

For Proxy Admins, this enables centralized management of all MCP Servers with access control. For developers, this means you'll only see the MCP tools assigned to you.

Codex-mini on Claude Code

This release brings support for calling codex-mini (OpenAI’s code assistant model) via Claude Code.

This is done by LiteLLM enabling any Responses API model (including o3-pro) to be called via /chat/completions and /v1/messages endpoints. This includes:

Streaming calls
Non-streaming calls
Cost Tracking on success + failure for Responses API models

Here's how to use it today

New / Updated Models

Pricing / Context Window Updates

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)	Type
VertexAI	`vertex_ai/claude-opus-4`	200K	$15.00	$75.00	New
OpenAI	`gpt-4o-audio-preview-2025-06-03`	128k	$2.5 (text), $40 (audio)	$10 (text), $80 (audio)	New
OpenAI	`o3-pro`	200k	20	80	New
OpenAI	`o3-pro-2025-06-10`	200k	20	80	New
OpenAI	`o3`	200k	2	8	Updated
OpenAI	`o3-2025-04-16`	200k	2	8	Updated
Azure	`azure/gpt-4o-mini-transcribe`	16k	1.25 (text), 3 (audio)	5 (text)	New
Mistral	`mistral/magistral-medium-latest`	40k	2	5	New
Mistral	`mistral/magistral-small-latest`	40k	0.5	1.5	New

Deepgram: nova-3 cost per second pricing is now supported.

Updated Models

Bugs

Watsonx
- Ignore space id on Watsonx deployments (throws json errors) - PR
Ollama
- Set tool call id for streaming calls - PR
Gemini (VertexAI + Google AI Studio)
- Fix tool call indexes - PR
- Handle empty string for arguments in function calls - PR
- Add audio/ogg mime type support when inferring from file url’s - PR
Custom LLM
- Fix passing api_base, api_key, litellm_params_dict to custom_llm embedding methods - PR s/o ElefHead
Huggingface
- Add /chat/completions to endpoint url when missing - PR
Deepgram
- Support async httpx calls - PR
Anthropic
- Append prefix (if set) to assistant content start - PR

Features

VertexAI
- Support vertex credentials set via env var on passthrough - PR
- Support for choosing ‘global’ region when model is only available there - PR
- Anthropic passthrough cost calculation + token tracking - PR
- Support ‘global’ vertex region on passthrough - PR
Anthropic
- ‘none’ tool choice param support - PR, Get Started
Perplexity
- Add ‘reasoning_effort’ support - PR, Get Started
Mistral
- Add mistral reasoning support - PR, Get Started
SGLang
- Map context window exceeded error for proper handling - PR
Deepgram
- Provider specific params support - PR
Azure
- Return content safety filter results - PR

LLM API Endpoints

Bugs

Chat Completion
- Streaming - Ensure consistent ‘created’ across chunks - PR

Features

MCP
- Add controls for MCP Permission Management - PR, Docs
- Add permission management for MCP List + Call Tool operations - PR, Docs
- Streamable HTTP server support - PR, PR, Docs
- Use Experimental dedicated Rest endpoints for list, calling MCP tools - PR
Responses API
- NEW API Endpoint - List input items - PR
- Background mode for OpenAI + Azure OpenAI - PR
- Langfuse/other Logging support on responses api requests - PR
Chat Completions
- Bridge for Responses API - allows calling codex-mini via /chat/completions and /v1/messages - PR, PR

Spend Tracking

Bugs

End Users
- Update enduser spend and budget reset date based on budget duration - PR (s/o laurien16)
Custom Pricing
- Convert scientific notation str to int - PR

Management Endpoints / UI

Bugs

Users
- /user/info - fix passing user with + in user id
- Add admin-initiated password reset flow - PR
- Fixes default user settings UI rendering error - PR
Budgets
- Correct success message when new user budget is created - PR

Features

Leftnav
- Show remaining Enterprise users on UI
MCP
- New server add form - PR
- Allow editing mcp servers - PR
Models
- Add deepgram models on UI
- Model Access Group support on UI - PR
Keys
- Trim long user id’s - PR
Logs
- Add live tail feature to logs view, allows user to disable auto refresh in high traffic - PR
- Audit Logs - preview screenshot - PR

Logging / Guardrails Integrations

Bugs

Arize
- Change space_key header to space_id - PR (s/o vanities)
Prometheus
- Fix total requests increment - PR

Features

Lasso Guardrails
- [NEW] Lasso Guardrails support - PR
Users
- New organizations param on /user/new - allows adding users to orgs on creation - PR
Prevent double logging when using bridge logic - PR

Performance / Reliability Improvements

Bugs

Tag based routing
- Do not consider ‘default’ models when request specifies a tag - PR (s/o thiagosalvatore)

Features

Caching
- New optional ‘litellm[caching]’ pip install for adding disk cache dependencies - PR

General Proxy Improvements

Bugs

aiohttp
- fixes for transfer encoding error on aiohttp transport - PR

Features

aiohttp
- Enable System Proxy Support for aiohttp transport - PR (s/o idootop)
CLI
- Make all commands show server URL - PR
Unicorn
- Allow setting keep alive timeout - PR
Experimental Rate Limiting v2 (enable via EXPERIMENTAL_MULTI_INSTANCE_RATE_LIMITING="True")
- Support specifying rate limit by output_tokens only - PR
- Decrement parallel requests on call failure - PR
- In-memory only rate limiting support - PR
- Return remaining rate limits by key/user/team - PR
Helm
- support extraContainers in migrations-job.yaml - PR

New Contributors

@laurien16 made their first contribution in https://github.com/BerriAI/litellm/pull/8460
@fengbohello made their first contribution in https://github.com/BerriAI/litellm/pull/11547
@lapinek made their first contribution in https://github.com/BerriAI/litellm/pull/11570
@yanwork made their first contribution in https://github.com/BerriAI/litellm/pull/11586
@dhs-shine made their first contribution in https://github.com/BerriAI/litellm/pull/11575
@ElefHead made their first contribution in https://github.com/BerriAI/litellm/pull/11450
@idootop made their first contribution in https://github.com/BerriAI/litellm/pull/11616
@stevenaldinger made their first contribution in https://github.com/BerriAI/litellm/pull/11649
@thiagosalvatore made their first contribution in https://github.com/BerriAI/litellm/pull/11454
@vanities made their first contribution in https://github.com/BerriAI/litellm/pull/11595
@alvarosevilla95 made their first contribution in https://github.com/BerriAI/litellm/pull/11661

Demo Instance

Here's a Demo Instance to test changes:

Instance: https://demo.litellm.ai/
Login Credentials:
- Username: admin
- Password: sk-1234

Git Diff

v1.72.2-stable

June 7, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run
-e STORE_MODEL_IN_DB=True
-p 4000:4000
ghcr.io/berriai/litellm:main-v1.72.2-stable

pip install litellm
pip install litellm==1.72.2.post1

TLDR

Why Upgrade
- Performance Improvements for /v1/messages: For this endpoint LiteLLM Proxy overhead is now down to 50ms at 250 RPS.
- Accurate Rate Limiting: Multi-instance rate limiting now tracks rate limits across keys, models, teams, and users with 0 spillover.
- Audit Logs on UI: Track when Keys, Teams, and Models were deleted by viewing Audit Logs on the LiteLLM UI.
- /v1/messages all models support: You can now use all LiteLLM models (gpt-4.1, o1-pro, gemini-2.5-pro) with /v1/messages API.
- Anthropic MCP: Use remote MCP Servers with Anthropic Models.
Who Should Read
- Teams using /v1/messages API (Claude Code)
- Proxy Admins using LiteLLM Virtual Keys and setting rate limits
Risk of Upgrade
- Medium
  - Upgraded ddtrace==3.8.0, if you use DataDog tracing this is a medium level risk. We recommend monitoring logs for any issues.

`/v1/messages` Performance Improvements

This release brings significant performance improvements to the /v1/messages API on LiteLLM.

For this endpoint LiteLLM Proxy overhead latency is now down to 50ms, and each instance can handle 250 RPS. We validated these improvements through load testing with payloads containing over 1,000 streaming chunks.

This is great for real time use cases with large requests (eg. multi turn conversations, Claude Code, etc.).

Multi-Instance Rate Limiting Improvements

LiteLLM now accurately tracks rate limits across keys, models, teams, and users with 0 spillover.

This is a significant improvement over the previous version, which faced issues with leakage and spillover in high traffic, multi-instance setups.

Key Changes:

Redis is now part of the rate limit check, instead of being a background sync. This ensures accuracy and reduces read/write operations during low activity.
LiteLLM now uses Lua scripts to ensure all checks are atomic.
In-memory caching uses Redis values. This prevents drift, and reduces Redis queries once objects are over their limit.

These changes are currently behind the feature flag - EXPERIMENTAL_ENABLE_MULTI_INSTANCE_RATE_LIMITING=True. We plan to GA this in our next release - subject to feedback.

Audit Logs on UI

This release introduces support for viewing audit logs in the UI. As a Proxy Admin, you can now check if and when a key was deleted, along with who performed the action.

LiteLLM tracks changes to the following entities and actions:

Entities: Keys, Teams, Users, Models
Actions: Create, Update, Delete, Regenerate

New Models / Updated Models

Newly Added Models

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)
Anthropic	`claude-4-opus-20250514`	200K	$15.00	$75.00
Anthropic	`claude-4-sonnet-20250514`	200K	$3.00	$15.00
VertexAI, Google AI Studio	`gemini-2.5-pro-preview-06-05`	1M	$1.25	$10.00
OpenAI	`codex-mini-latest`	200K	$1.50	$6.00
Cerebras	`qwen-3-32b`	128K	$0.40	$0.80
SambaNova	`DeepSeek-R1`	32K	$5.00	$7.00
SambaNova	`DeepSeek-R1-Distill-Llama-70B`	131K	$0.70	$1.40

Model Updates

Anthropic
- Cost tracking added for new Claude models - PR
  - claude-4-opus-20250514
  - claude-4-sonnet-20250514
- Support for MCP tool calling with Anthropic models - PR
Google AI Studio
- Google Gemini 2.5 Pro Preview 06-05 support - PR
- Gemini streaming thinking content parsing with reasoning_content - PR
- Support for no reasoning option for Gemini models - PR
- URL context support for Gemini models - PR
- Gemini embeddings-001 model prices and context window - PR
OpenAI
- Cost tracking for codex-mini-latest - PR
Vertex AI
- Cache token tracking on streaming calls - PR
- Return response_id matching upstream response ID for stream and non-stream - PR
Cerebras
- Cerebras/qwen-3-32b model pricing and context window - PR
HuggingFace
- Fixed embeddings using non-default input_type - PR
DataRobot
- New provider integration for enterprise AI workflows - PR
DeepSeek
- DeepSeek R1 family model configuration via Together AI - PR
- DeepSeek R1 pricing and context window configuration - PR

LLM API Endpoints

Images API
- Azure endpoint support for image endpoints - PR
Anthropic Messages API
- Support for ALL LiteLLM Providers (OpenAI, Azure, Bedrock, Vertex, DeepSeek, etc.) on /v1/messages API Spec - PR
- Performance improvements for /v1/messages route - PR
- Return streaming usage statistics when using LiteLLM with Bedrock models - PR
Embeddings API
- Provider-specific optional params handling for embedding calls - PR
- Proper Sagemaker request attribute usage for embeddings - PR
Rerank API
- New HuggingFace rerank provider support - PR, Guide

Spend Tracking

Added token tracking for anthropic batch calls via /anthropic passthrough route- PR

Management Endpoints / UI

SSO/Authentication
- SSO configuration endpoints and UI integration with persistent settings - PR
- Update proxy admin ID role in DB + Handle SSO redirects with custom root path - PR
- Support returning virtual key in custom auth - PR
- User ID validation to ensure it is not an email or phone number - PR
Teams
- Fixed Create/Update team member API 500 error - PR
- Enterprise feature gating for RegenerateKeyModal in KeyInfoView - PR
SCIM
- Fixed SCIM running patch operation case sensitivity - PR
General
- Converted action buttons to sticky footer action buttons - PR
- Custom Server Root Path - support for serving UI on a custom root path - Guide

Logging / Guardrails Integrations

Logging

S3
- Async + Batched S3 Logging for improved performance - PR
DataDog
- Add instrumentation for streaming chunks - PR
- Add DD profiler to monitor Python profile of LiteLLM CPU% - PR
- Bump DD trace version - PR
Prometheus
- Pass custom metadata labels in litellm_total_token metrics - PR
GCS
- Update GCSBucketBase to handle GSM project ID if passed - PR

Guardrails

Presidio
- Add presidio_language yaml configuration support for guardrails - PR

Performance / Reliability Improvements

Performance Optimizations
- Don't run auth on /health/liveliness endpoints - PR
- Don't create 1 task for every hanging request alert - PR
- Add debugging endpoint to track active /asyncio-tasks - PR
- Make batch size for maximum retention in spend logs controllable - PR
- Expose flag to disable token counter - PR
- Support pipeline redis lpop for older redis versions - PR

Bug Fixes

LLM API Fixes
- Anthropic: Fix regression when passing file url's to the 'file_id' parameter - PR
- Vertex AI: Fix Vertex AI any_of issues for Description and Default. - PR
- Fix transcription model name mapping - PR
- Image Generation: Fix None values in usage field for gpt-image-1 model responses - PR
- Responses API: Fix _transform_responses_api_content_to_chat_completion_content doesn't support file content type - PR
- Fireworks AI: Fix rate limit exception mapping - detect "rate limit" text in error messages - PR
Spend Tracking/Budgets
- Respect user_header_name property for budget selection and user identification - PR
MCP Server
- Remove duplicate server_id MCP config servers - PR
Function Calling
- supports_function_calling works with llm_proxy models - PR
Knowledge Base
- Fixed Knowledge Base Call returning error - PR

New Contributors

@mjnitz02 made their first contribution in #10385
@hagan made their first contribution in #10479
@wwells made their first contribution in #11409
@likweitan made their first contribution in #11400
@raz-alon made their first contribution in #10102
@jtsai-quid made their first contribution in #11394
@tmbo made their first contribution in #11362
@wangsha made their first contribution in #11351
@seankwalker made their first contribution in #11452
@pazevedo-hyland made their first contribution in #11381
@cainiaoit made their first contribution in #11438
@vuanhtu52 made their first contribution in #11508

Demo Instance

Here's a Demo Instance to test changes:

Instance: https://demo.litellm.ai/
Login Credentials:
- Username: admin
- Password: sk-1234

Git Diff

v1.72.0-stable

May 31, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run
-e STORE_MODEL_IN_DB=True
-p 4000:4000
ghcr.io/berriai/litellm:main-v1.72.0-stable

pip install litellm
pip install litellm==1.72.0

Key Highlights

LiteLLM v1.72.0-stable.rc is live now. Here are the key highlights of this release:

Vector Store Permissions: Control Vector Store access at the Key, Team, and Organization level.
Rate Limiting Sliding Window support: Improved accuracy for Key/Team/User rate limits with request tracking across minutes.
Aiohttp Transport used by default: Aiohttp transport is now the default transport for LiteLLM networking requests. This gives users 2x higher RPS per instance with a 40ms median latency overhead.
Bedrock Agents: Call Bedrock Agents with /chat/completions, /response endpoints.
Anthropic File API: Upload and analyze CSV files with Claude-4 on Anthropic via LiteLLM.
Prometheus: End users (end_user) will no longer be tracked by default on Prometheus. Tracking end_users on prometheus is now opt-in. This is done to prevent the response from /metrics from becoming too large. Read More

Vector Store Permissions

This release brings support for managing permissions for vector stores by Keys, Teams, Organizations (entities) on LiteLLM. When a request attempts to query a vector store, LiteLLM will block it if the requesting entity lacks the proper permissions.

This is great for use cases that require access to restricted data that you don't want everyone to use.

Over the next week we plan on adding permission management for MCP Servers.

Aiohttp Transport used by default

Aiohttp transport is now the default transport for LiteLLM networking requests. This gives users 2x higher RPS per instance with a 40ms median latency overhead. This has been live on LiteLLM Cloud for a week + gone through alpha users testing for a week.

If you encounter any issues, you can disable using the aiohttp transport in the following ways:

On LiteLLM Proxy

Set the DISABLE_AIOHTTP_TRANSPORT=True in the environment variables.

Environment Variable
export DISABLE_AIOHTTP_TRANSPORT="True"

On LiteLLM Python SDK

Set the disable_aiohttp_transport=True to disable aiohttp transport.

Python SDK
import litellm

litellm.disable_aiohttp_transport = True # default is False, enable this to disable aiohttp transport
result = litellm.completion(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello, world!"}],
)
print(result)

New Models / Updated Models

Bedrock
- Video support for Bedrock Converse - PR
- InvokeAgents support as /chat/completions route - PR, Get Started
- AI21 Jamba models compatibility fixes - PR
- Fixed duplicate maxTokens parameter for Claude with thinking - PR
Gemini (Google AI Studio + Vertex AI)
- Parallel tool calling support with parallel_tool_calls parameter - PR
- All Gemini models now support parallel function calling - PR
VertexAI
- codeExecution tool support and anyOf handling - PR
- Vertex AI Anthropic support on /v1/messages - PR
- Thinking, global regions, and parallel tool calling improvements - PR
- Web Search Support PR
Anthropic
- Thinking blocks on streaming support - PR
- Files API with form-data support on passthrough - PR
- File ID support on /chat/completion - PR
xAI
- Web Search Support PR
Google AI Studio
- Web Search Support PR
Mistral
- Updated mistral-medium prices and context sizes - PR
Ollama
- Tool calls parsing on streaming - PR
Cohere
- Swapped Cohere and Cohere Chat provider positioning - PR
Nebius AI Studio
- New provider integration - PR

LLM API Endpoints

Image Edits API
- Azure support for /v1/images/edits - PR
- Cost tracking for image edits endpoint (OpenAI, Azure) - PR
Completions API
- Codestral latency overhead tracking on /v1/completions - PR
Audio Transcriptions API
- GPT-4o mini audio preview pricing without date - PR
- Non-default params support for audio transcription - PR
Responses API
- Session management fixes for using Non-OpenAI models - PR

Management Endpoints / UI

Vector Stores
- Permission management for LiteLLM Keys, Teams, and Organizations - PR
- UI display of vector store permissions - PR
- Vector store access controls enforcement - PR
- Object permissions fixes and QA improvements - PR
Teams
- "All proxy models" display when no models selected - PR
- Removed redundant teamInfo call, using existing teamsList - PR
- Improved model tags display on Keys, Teams and Org pages - PR
SSO/SCIM
- Bug fixes for showing SCIM token on UI - PR
General UI
- Fix "UI Session Expired. Logging out" - PR
- Support for forwarding /sso/key/generate to server root path URL - PR

Logging / Guardrails Integrations

Logging

Prometheus
- End users will no longer be tracked by default on Prometheus. Tracking end_users on prometheus is now opt-in. PR
Langfuse
- Performance improvements: Fixed "Max langfuse clients reached" issue - PR
Helicone
- Base URL support - PR
Sentry
- Added sentry sample rate configuration - PR

Guardrails

Bedrock Guardrails
- Streaming support for bedrock post guard - PR
- Auth parameter persistence fixes - PR
Pangea Guardrails
- Added Pangea provider to Guardrails hook - PR

Performance / Reliability Improvements

aiohttp Transport
- Handling for aiohttp.ClientPayloadError - PR
- SSL verification settings support - PR
- Rollback to httpx==0.27.0 for stability - PR
Request Limiting
- Sliding window logic for parallel request limiter v2 - PR

Bug Fixes

LLM API Fixes
- Added missing request_kwargs to get_available_deployment call - PR
- Fixed calling Azure O-series models - PR
- Support for dropping non-OpenAI params via additional_drop_params - PR
- Fixed frequency_penalty to repeat_penalty parameter mapping - PR
- Fix for embedding cache hits on string input - PR
General
- OIDC provider improvements and audience bug fix - PR
- Removed AzureCredentialType restriction on AZURE_CREDENTIAL - PR
- Prevention of sensitive key leakage to Langfuse - PR
- Fixed healthcheck test using curl when curl not in image - PR

New Contributors

@agajdosi made their first contribution in #9737
@ketangangal made their first contribution in #11161
@Aktsvigun made their first contribution in #11143
@ryanmeans made their first contribution in #10775
@nikoizs made their first contribution in #10054
@Nitro963 made their first contribution in #11202
@Jacobh2 made their first contribution in #11207
@regismesquita made their first contribution in #10729
@Vinnie-Singleton-NN made their first contribution in #10283
@trashhalo made their first contribution in #11219
@VigneshwarRajasekaran made their first contribution in #11223
@AnilAren made their first contribution in #11233
@fadil4u made their first contribution in #11242
@whitfin made their first contribution in #11279
@hcoona made their first contribution in #11272
@keyute made their first contribution in #11173
@emmanuel-ferdman made their first contribution in #11230

Demo Instance

Here's a Demo Instance to test changes:

Instance: https://demo.litellm.ai/
Login Credentials:
- Username: admin
- Password: sk-1234

Git Diff

v1.71.1-stable - 2x Higher Requests Per Second (RPS)

May 24, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run
-e STORE_MODEL_IN_DB=True
-p 4000:4000
ghcr.io/berriai/litellm:main-v1.71.1-stable

pip install litellm
pip install litellm==1.71.1

Key Highlights

LiteLLM v1.71.1-stable is live now. Here are the key highlights of this release:

Performance improvements: LiteLLM can now scale to 200 RPS per instance with a 74ms median response time.
File Permissions: Control file access across OpenAI, Azure, VertexAI.
MCP x OpenAI: Use MCP servers with OpenAI Responses API.

Performance Improvements

This release brings aiohttp support for all LLM api providers. This means that LiteLLM can now scale to 200 RPS per instance with a 40ms median latency overhead.

This change doubles the RPS LiteLLM can scale to at this latency overhead.

You can opt into this by enabling the flag below. (We expect to make this the default in 1 week.)

Flag to enable

On LiteLLM Proxy

Set the USE_AIOHTTP_TRANSPORT=True in the environment variables.

Environment Variable
export USE_AIOHTTP_TRANSPORT="True"

On LiteLLM Python SDK

Set the use_aiohttp_transport=True to enable aiohttp transport.

Python SDK
import litellm

litellm.use_aiohttp_transport = True # default is False, enable this to use aiohttp transport
result = litellm.completion(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello, world!"}],
)
print(result)

File Permissions

This release brings support for File Permissions and Finetuning APIs to LiteLLM Managed Files. This is great for:

Proxy Admins: as users can only view/edit/delete files they’ve created - even when using shared OpenAI/Azure/Vertex deployments.
Developers: get a standard interface to use Files across Chat/Finetuning/Batch APIs.

New Models / Updated Models

Gemini VertexAI, Google AI Studio
- New gemini models - PR 1, PR 2
  - gemini-2.5-flash-preview-tts
  - gemini-2.0-flash-preview-image-generation
  - gemini/gemini-2.5-flash-preview-05-20
  - gemini-2.5-flash-preview-05-20
Anthropic
- Claude-4 model family support - PR
Bedrock
- Claude-4 model family support - PR
- Support for reasoning_effort and thinking parameters for Claude-4 - PR
VertexAI
- Claude-4 model family support - PR
- Global endpoints support - PR
- authorized_user credentials type support - PR
xAI
- xai/grok-3 pricing information - PR
LM Studio
- Structured JSON schema outputs support - PR
SambaNova
- Updated models and parameters - PR
Databricks
- Llama 4 Maverick model cost - PR
- Claude 3.7 Sonnet output token cost correction - PR
Azure
- Mistral Medium 25.05 support - PR
- Certificate-based authentication support - PR
Mistral
- devstral-small-2505 model pricing and context window - PR
Ollama
- Wildcard model support - PR
CustomLLM
- Embeddings support added - PR
Featherless AI
- Access to 4200+ models - PR

LLM API Endpoints

Image Edits
- /v1/images/edits - Support for /images/edits endpoint - PR PR
- Content policy violation error mapping - PR
Responses API
- MCP support for Responses API - PR
Files API
- LiteLLM Managed Files support for finetuning - PR PR
- Validation for file operations (retrieve/list/delete) - PR

Management Endpoints / UI

Teams
- Key and member count display - PR
- Spend rounded to 4 decimal points - PR
- Organization and team create buttons repositioned - PR
Keys
- Key reassignment and 'updated at' column - PR
- Show model access groups during creation - PR
Logs
- Model filter on logs - PR
- Passthrough endpoint error logs support - PR
Guardrails
- Config.yaml guardrails display - PR
Organizations/Users
- Spend rounded to 4 decimal points - PR
- Show clear error when adding a user to a team - PR
Audit Logs
- /list and /info endpoints for Audit Logs - PR

Logging / Alerting Integrations

Prometheus
- Track route on proxy_* metrics - PR
Langfuse
- Support for prompt_label parameter - PR
- Consistent modelParams logging - PR
DeepEval/ConfidentAI
- Logging enabled for proxy and SDK - PR
Logfire
- Fix otel proxy server initialization when using Logfire - PR

Authentication & Security

JWT Authentication
- Support for applying default internal user parameters when upserting a user via JWT authentication - PR
- Map a user to a team when upserting a user via JWT authentication - PR
Custom Auth
- Support for switching between custom auth and API key auth - PR

Performance / Reliability Improvements

aiohttp Transport
- 97% lower median latency (feature flagged) - PR PR
Background Health Checks
- Improved reliability - PR
Response Handling
- Better streaming status code detection - PR
- Response ID propagation improvements - PR
Thread Management
- Removed error-creating threads for reliability - PR

General Proxy Improvements

Proxy CLI
- Skip server startup flag - PR
- Avoid DATABASE_URL override when provided - PR
Model Management
- Clear cache and reload after model updates - PR
- Computer use support tracking - PR
Helm Chart
- LoadBalancer class support - PR

Bug Fixes

This release includes numerous bug fixes to improve stability and reliability:

LLM Provider Fixes
- VertexAI:
  - Fixed quota_project_id parameter issue - PR
  - Fixed credential refresh exceptions - PR
- Cohere: Fixes for adding Cohere models through LiteLLM UI - PR
- Anthropic:
  - Fixed streaming dict object handling for /v1/messages - PR
- OpenRouter:
  - Fixed stream usage ID issues - PR
Authentication & Users
- Fixed invitation email link generation - PR
- Fixed JWT authentication default role - PR
- Fixed user budget reset functionality - PR
- Fixed SSO user compatibility and email validation - PR
Database & Infrastructure
- Fixed DB connection parameter handling - PR
- Fixed email invitation link - PR
UI & Display
- Fixed MCP tool rendering when no arguments required - PR
- Fixed team model alias deletion - PR
- Fixed team viewer permissions - PR
Model & Routing
- Fixed team model mapping in route requests - PR
- Fixed standard optional parameter passing - PR

New Contributors

@DarinVerheijke made their first contribution in PR #10596
@estsauver made their first contribution in PR #10929
@mohittalele made their first contribution in PR #10665
@pselden made their first contribution in PR #10899
@unrealandychan made their first contribution in PR #10842
@dastaiger made their first contribution in PR #10946
@slytechnical made their first contribution in PR #10881
@daarko10 made their first contribution in PR #11006
@sorenmat made their first contribution in PR #10658
@matthid made their first contribution in PR #10982
@jgowdy-godaddy made their first contribution in PR #11032
@bepotp made their first contribution in PR #11008
@jmorenoc-o made their first contribution in PR #11031
@martin-liu made their first contribution in PR #11076
@gunjan-solanki made their first contribution in PR #11064
@tokoko made their first contribution in PR #10980
@spike-spiegel-21 made their first contribution in PR #10649
@kreatoo made their first contribution in PR #10927
@baejooc made their first contribution in PR #10887
@keykbd made their first contribution in PR #11114
@dalssoft made their first contribution in PR #11088
@jtong99 made their first contribution in PR #10853

Demo Instance

Here's a Demo Instance to test changes:

Instance: https://demo.litellm.ai/
Login Credentials:
- Username: admin
- Password: sk-1234

Git Diff

v1.70.1-stable - Gemini Realtime API Support

May 17, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run
-e STORE_MODEL_IN_DB=True
-p 4000:4000
ghcr.io/berriai/litellm:main-v1.70.1-stable

pip install litellm
pip install litellm==1.70.1

Key Highlights

LiteLLM v1.70.1-stable is live now. Here are the key highlights of this release:

Gemini Realtime API: You can now call Gemini's Live API via the OpenAI /v1/realtime API
Spend Logs Retention Period: Enable deleting spend logs older than a certain period.
PII Masking 2.0: Easily configure masking or blocking specific PII/PHI entities on the UI

Gemini Realtime API

This release brings support for calling Gemini's realtime models (e.g. gemini-2.0-flash-live) via OpenAI's /v1/realtime API. This is great for developers as it lets them easily switch from OpenAI to Gemini by just changing the model name.

Key Highlights:

Support for text + audio input/output
Support for setting session configurations (modality, instructions, activity detection) in the OpenAI format
Support for logging + usage tracking for realtime sessions

This is currently supported via Google AI Studio. We plan to release VertexAI support over the coming week.

Read more

Spend Logs Retention Period

This release enables deleting LiteLLM Spend Logs older than a certain period. Since we now enable storing the raw request/response in the logs, deleting old logs ensures the database remains performant in production.

Read more

PII Masking 2.0

This release brings improvements to our Presidio PII Integration. As a Proxy Admin, you now have the ability to:

Mask or block specific entities (e.g., block medical licenses while masking other entities like emails).
Monitor guardrails in production. LiteLLM Logs will now show you the guardrail run, the entities it detected, and its confidence score for each entity.

Read more

New Models / Updated Models

Gemini (VertexAI + Google AI Studio)
- /chat/completion
  - Handle audio input - PR
  - Fixes maximum recursion depth issue when using deeply nested response schemas with Vertex AI by Increasing DEFAULT_MAX_RECURSE_DEPTH from 10 to 100 in constants. PR
  - Capture reasoning tokens in streaming mode - PR
Google AI Studio
- /realtime
  - Gemini Multimodal Live API support
  - Audio input/output support, optional param mapping, accurate usage calculation - PR
VertexAI
- /chat/completion
  - Fix llama streaming error - where model response was nested in returned streaming chunk - PR
Ollama
- /chat/completion
  - structure responses fix - PR
Bedrock
- /chat/completion
  - Handle thinking_blocks when assistant.content is None - PR
  - Fixes to only allow accepted fields for tool json schema - PR
  - Add bedrock sonnet prompt caching cost information
  - Mistral Pixtral support - PR
  - Tool caching support - PR
- /messages
  - allow using dynamic AWS Params - PR
Nvidia NIM
- /chat/completion
  - Add tools, tool_choice, parallel_tool_calls support - PR
Novita AI
- New Provider added for /chat/completion routes - PR
Azure
- /image/generation
  - Fix azure dall e 3 call with custom model name - PR
Cohere
- /embeddings
  - Migrate embedding to use /v2/embed - adds support for output_dimensions param - PR
Anthropic
- /chat/completion
  - Web search tool support - native + openai format - Get Started
VLLM
- /embeddings
  - Support embedding input as list of integers
OpenAI
- /chat/completion
  - Fix - b64 file data input handling - Get Started
  - Add ‘supports_pdf_input’ to all vision models - PR

LLM API Endpoints

Responses API
- Fix delete API support - PR
Rerank API
- /v2/rerank now registered as ‘llm_api_route’ - enabling non-admins to call it - PR

Spend Tracking Improvements

/chat/completion, /messages
- Anthropic - web search tool cost tracking - PR
- Groq - update model max tokens + cost information - PR
/audio/transcription
- Azure - Add gpt-4o-mini-tts pricing - PR
- Proxy - Fix tracking spend by tag - PR
/embeddings
- Azure AI - Add cohere embed v4 pricing - PR

Management Endpoints / UI

Models
- Ollama - adds api base param to UI
Logs
- Add team id, key alias, key hash filter on logs - https://github.com/BerriAI/litellm/pull/10831
- Guardrail tracing now in Logs UI - https://github.com/BerriAI/litellm/pull/10893
Teams
- Patch for updating team info when team in org and members not in org - https://github.com/BerriAI/litellm/pull/10835
Guardrails
- Add Bedrock, Presidio, Lakers guardrails on UI - https://github.com/BerriAI/litellm/pull/10874
- See guardrail info page - https://github.com/BerriAI/litellm/pull/10904
- Allow editing guardrails on UI - https://github.com/BerriAI/litellm/pull/10907
Test Key
- select guardrails to test on UI

Logging / Alerting Integrations

StandardLoggingPayload
- Log any x- headers in requester metadata - Get Started
- Guardrail tracing now in standard logging payload - Get Started
Generic API Logger
- Support passing application/json header
Arize Phoenix
- fix: URL encode OTEL_EXPORTER_OTLP_TRACES_HEADERS for Phoenix Integration - PR
- add guardrail tracing to OTEL, Arize phoenix - PR
PagerDuty
- Pagerduty is now a free feature - PR
Alerting
- Sending slack alerts on virtual key/user/team updates is now free - PR

Guardrails

Guardrails
- New /apply_guardrail endpoint for directly testing a guardrail - PR
Lakera
- /v2 endpoints support - PR
Presidio
- Fixes handling of message content on presidio guardrail integration - PR
- Allow specifying PII Entities Config - PR
Aim Security
- Support for anonymization in AIM Guardrails - PR

Performance / Loadbalancing / Reliability improvements

Allow overriding all constants using a .env variable - PR
Maximum retention period for spend logs
- Add retention flag to config - PR
- Support for cleaning up logs based on configured time period - PR

General Proxy Improvements

Authentication
- Handle Bearer $LITELLM_API_KEY in x-litellm-api-key custom header PR
New Enterprise pip package - litellm-enterprise - fixes issue where enterprise folder was not found when using pip package
Proxy CLI
- Add models import command - PR
OpenWebUI
- Configure LiteLLM to Parse User Headers from Open Web UI
LiteLLM Proxy w/ LiteLLM SDK
- Option to force/always use the litellm proxy when calling via LiteLLM SDK

New Contributors

@imdigitalashish made their first contribution in PR #10617
@LouisShark made their first contribution in PR #10688
@OscarSavNS made their first contribution in PR #10764
@arizedatngo made their first contribution in PR #10654
@jugaldb made their first contribution in PR #10805
@daikeren made their first contribution in PR #10781
@naliotopier made their first contribution in PR #10077
@damienpontifex made their first contribution in PR #10813
@Dima-Mediator made their first contribution in PR #10789
@igtm made their first contribution in PR #10814
@shibaboy made their first contribution in PR #10752
@camfarineau made their first contribution in PR #10629
@ajac-zero made their first contribution in PR #10439
@damgem made their first contribution in PR #9802
@hxdror made their first contribution in PR #10757
@wwwillchen made their first contribution in PR #10894

Demo Instance

Here's a Demo Instance to test changes:

Instance: https://demo.litellm.ai/
Login Credentials:
- Username: admin
- Password: sk-1234

Git Diff

v1.69.0-stable - Loadbalance Batch API Models

May 10, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run
-e STORE_MODEL_IN_DB=True
-p 4000:4000
ghcr.io/berriai/litellm:main-v1.69.0-stable

pip install litellm
pip install litellm==1.69.0.post1

Key Highlights

LiteLLM v1.69.0-stable brings the following key improvements:

Loadbalance Batch API Models: Easily loadbalance across multiple azure batch deployments using LiteLLM Managed Files
Email Invites 2.0: Send new users onboarded to LiteLLM an email invite.
Nscale: LLM API for compliance with European regulations.
Bedrock /v1/messages: Use Bedrock Anthropic models with Anthropic's /v1/messages.

Batch API Load Balancing

This release brings LiteLLM Managed File support to Batches. This is great for:

Proxy Admins: You can now control which Batch models users can call.
Developers: You no longer need to know the Azure deployment name when creating your batch .jsonl files - just specify the model your LiteLLM key has access to.

Over time, we expect LiteLLM Managed Files to be the way most teams use Files across /chat/completions, /batch, /fine_tuning endpoints.

Read more here

Email Invites

This release brings the following improvements to our email invite integration:

New templates for user invited and key created events.
Fixes for using SMTP email providers.
Native support for Resend API.
Ability for Proxy Admins to control email events.

For LiteLLM Cloud Users, please reach out to us if you want this enabled for your instance.

Read more here

New Models / Updated Models

Gemini (VertexAI + Google AI Studio)
- Added gemini-2.5-pro-preview-05-06 models with pricing and context window info - PR
- Set correct context window length for all Gemini 2.5 variants - PR
Perplexity:
- Added new Perplexity models - PR
- Added sonar-deep-research model pricing - PR
Azure OpenAI:
- Fixed passing through of azure_ad_token_provider parameter - PR
OpenAI:
- Added support for pdf url's in 'file' parameter - PR
Sagemaker:
- Fix content length for sagemaker_chat provider - PR
Azure AI Foundry:
- Added cost tracking for the following models PR
  - DeepSeek V3 0324
  - Llama 4 Scout
  - Llama 4 Maverick
Bedrock:
- Added cost tracking for Bedrock Llama 4 models - PR
- Fixed template conversion for Llama 4 models in Bedrock - PR
- Added support for using Bedrock Anthropic models with /v1/messages format - PR
- Added streaming support for Bedrock Anthropic models with /v1/messages format - PR
OpenAI: Added reasoning_effort support for o3 models - PR
Databricks:
- Fixed issue when Databricks uses external model and delta could be empty - PR
Cerebras: Fixed Llama-3.1-70b model pricing and context window - PR
Ollama:
- Fixed custom price cost tracking and added 'max_completion_token' support - PR
- Fixed KeyError when using JSON response format - PR
🆕 Nscale:
- Added support for chat, image generation endpoints - PR

LLM API Endpoints

Messages API:
- 🆕 Added support for using Bedrock Anthropic models with /v1/messages format - PR and streaming support - PR
Moderations API:
- Fixed bug to allow using LiteLLM UI credentials for /moderations API - PR
Realtime API:
- Fixed setting 'headers' in scope for websocket auth requests and infinite loop issues - PR
Files API:
- Unified File ID output support - PR
- Support for writing files to all deployments - PR
- Added target model name validation - PR
Batches API:
- Complete unified batch ID support - replacing model in jsonl to be deployment model name - PR
- Beta support for unified file ID (managed files) for batches - PR

Spend Tracking / Budget Improvements

Bug Fix - PostgreSQL Integer Overflow Error in DB Spend Tracking - PR

Management Endpoints / UI

Models
- Fixed model info overwriting when editing a model on UI - PR
- Fixed team admin model updates and organization creation with specific models - PR
Logs:
- Bug Fix - copying Request/Response on Logs Page - PR
- Bug Fix - log did not remain in focus on QA Logs page + text overflow on error logs - PR
- Added index for session_id on LiteLLM_SpendLogs for better query performance - PR
User Management:
- Added user management functionality to Python client library & CLI - PR
- Bug Fix - Fixed SCIM token creation on Admin UI - PR
- Bug Fix - Added 404 response when trying to delete verification tokens that don't exist - PR

Logging / Guardrail Integrations

Custom Logger API: v2 Custom Callback API (send llm logs to custom api) - PR, Get Started
OpenTelemetry:
- Fixed OpenTelemetry to follow genai semantic conventions + support for 'instructions' param for TTS - PR
** Bedrock PII**:
- Add support for PII Masking with bedrock guardrails - Get Started, PR
Documentation:
- Added documentation for StandardLoggingVectorStoreRequest - PR

Performance / Reliability Improvements

Python Compatibility:
- Added support for Python 3.11- (fixed datetime UTC handling) - PR
- Fixed UnicodeDecodeError: 'charmap' on Windows during litellm import - PR
Caching:
- Fixed embedding string caching result - PR
- Fixed cache miss for Gemini models with response_format - PR

General Proxy Improvements

Proxy CLI:
- Added --version flag to litellm-proxy CLI - PR
- Added dedicated litellm-proxy CLI - PR
Alerting:
- Fixed Slack alerting not working when using a DB - PR
Email Invites:
- Added V2 Emails with fixes for sending emails when creating keys + Resend API support - PR
- Added user invitation emails - PR
- Added endpoints to manage email settings - PR
General:
- Fixed bug where duplicate JSON logs were getting emitted - PR

New Contributors

@zoltan-ongithub made their first contribution in PR #10568
@mkavinkumar1 made their first contribution in PR #10548
@thomelane made their first contribution in PR #10549
@frankzye made their first contribution in PR #10540
@aholmberg made their first contribution in PR #10591
@aravindkarnam made their first contribution in PR #10611
@xsg22 made their first contribution in PR #10648
@casparhsws made their first contribution in PR #10635
@hypermoose made their first contribution in PR #10370
@tomukmatthews made their first contribution in PR #10638
@keyute made their first contribution in PR #10652
@GPTLocalhost made their first contribution in PR #10687
@husnain7766 made their first contribution in PR #10697
@claralp made their first contribution in PR #10694
@mollux made their first contribution in PR #10690

v1.68.0-stable

May 3, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run
-e STORE_MODEL_IN_DB=True
-p 4000:4000
ghcr.io/berriai/litellm:main-v1.68.0-stable

pip install litellm
pip install litellm==1.68.0.post1

Key Highlights

LiteLLM v1.68.0-stable will be live soon. Here are the key highlights of this release:

Bedrock Knowledge Base: You can now call query your Bedrock Knowledge Base with all LiteLLM models via /chat/completion or /responses API.
Rate Limits: This release brings accurate rate limiting across multiple instances, reducing spillover to at most 10 additional requests in high traffic.
Meta Llama API: Added support for Meta Llama API Get Started
LlamaFile: Added support for LlamaFile Get Started

Bedrock Knowledge Base (Vector Store)

This release adds support for Bedrock vector stores (knowledge bases) in LiteLLM. With this update, you can:

Use Bedrock vector stores in the OpenAI /chat/completions spec with all LiteLLM supported models.
View all available vector stores through the LiteLLM UI or API.
Configure vector stores to be always active for specific models.
Track vector store usage in LiteLLM Logs.

For the next release we plan on allowing you to set key, user, team, org permissions for vector stores.

Read more here

Rate Limiting

This release brings accurate multi-instance rate limiting across keys/users/teams. Outlining key engineering changes below:

Change: Instances now increment cache value instead of setting it. To avoid calling Redis on each request, this is synced every 0.01s.
Accuracy: In testing, we saw a maximum spill over from expected of 10 requests, in high traffic (100 RPS, 3 instances), vs. current 189 request spillover
Performance: Our load tests show this to reduce median response time by 100ms in high traffic

This is currently behind a feature flag, and we plan to have this be the default by next week. To enable this today, just add this environment variable:

export LITELLM_RATE_LIMIT_ACCURACY=true

Read more here

New Models / Updated Models

Gemini (VertexAI + Google AI Studio)
- Handle more json schema - openapi schema conversion edge cases PR
- Tool calls - return ‘finish_reason=“tool_calls”’ on gemini tool calling response PR
VertexAI
- Meta/llama-4 model support PR
- Meta/llama3 - handle tool call result in content PR
- Meta/* - return ‘finish_reason=“tool_calls”’ on tool calling response PR
Bedrock
- Image Generation - Support new ‘stable-image-core’ models - PR
- Knowledge Bases - support using Bedrock knowledge bases with /chat/completions PR
- Anthropic - add ‘supports_pdf_input’ for claude-3.7-bedrock models PR, Get Started
OpenAI
- Support OPENAI_BASE_URL in addition to OPENAI_API_BASE PR
- Correctly re-raise 504 timeout errors PR
- Native Gpt-4o-mini-tts support PR
🆕 Meta Llama API provider PR
🆕 LlamaFile provider PR

LLM API Endpoints

Response API
- Fix for handling multi turn sessions PR
Embeddings
- Caching fixes - PR
  - handle str -> list cache
  - Return usage tokens for cache hit
  - Combine usage tokens on partial cache hits
🆕 Vector Stores
- Allow defining Vector Store Configs - PR
- New StandardLoggingPayload field for requests made when a vector store is used - PR
- Show Vector Store / KB Request on LiteLLM Logs Page - PR
- Allow using vector store in OpenAI API spec with tools - PR
MCP
- Ensure Non-Admin virtual keys can access /mcp routes - PR
  
  Note: Currently, all Virtual Keys are able to access the MCP endpoints. We are working on a feature to allow restricting MCP access by keys/teams/users/orgs. Follow here for updates.
Moderations
- Add logging callback support for /moderations API - PR

Spend Tracking / Budget Improvements

OpenAI
- computer-use-preview cost tracking / pricing PR
- gpt-4o-mini-tts input cost tracking - PR
Fireworks AI - pricing updates - new 0-4b model pricing tier + llama4 model pricing
Budgets
- Budget resets now happen as start of day/week/month - PR
- Trigger Soft Budget Alerts When Key Crosses Threshold - PR
Token Counting
- Rewrite of token_counter() function to handle to prevent undercounting tokens - PR

Management Endpoints / UI

Virtual Keys
- Fix filtering on key alias - PR
- Support global filtering on keys - PR
- Pagination - fix clicking on next/back buttons on table - PR
Models
- Triton - Support adding model/provider on UI - PR
- VertexAI - Fix adding vertex models with reusable credentials - PR
- LLM Credentials - show existing credentials for easy editing - PR
Teams
- Allow reassigning team to other org - PR
Organizations
- Fix showing org budget on table - PR

Logging / Guardrail Integrations

Langsmith
- Respect langsmith_batch_size param - PR

Performance / Loadbalancing / Reliability improvements

Redis
- Ensure all redis queues are periodically flushed, this fixes an issue where redis queue size was growing indefinitely when request tags were used - PR
Rate Limits
- Multi-instance rate limiting support across keys/teams/users/customers - PR, PR, PR
Azure OpenAI OIDC
- allow using litellm defined params for OIDC Auth - PR

General Proxy Improvements

Security
- Allow blocking web crawlers - PR
Auth
- Support x-litellm-api-key header param by default, this fixes an issue from the prior release where x-litellm-api-key was not being used on vertex ai passthrough requests - PR
- Allow key at max budget to call non-llm api endpoints - PR
🆕 Python Client Library for LiteLLM Proxy management endpoints
- Initial PR - PR
- Support for doing HTTP requests - PR
Dependencies
- Don’t require uvloop for windows - PR

v1.67.4-stable - Improved User Management

April 26, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run
-e STORE_MODEL_IN_DB=True
-p 4000:4000
ghcr.io/berriai/litellm:main-v1.67.4-stable

pip install litellm
pip install litellm==1.67.4.post1

Key Highlights

Improved User Management: This release enables search and filtering across users, keys, teams, and models.
Responses API Load Balancing: Route requests across provider regions and ensure session continuity.
UI Session Logs: Group several requests to LiteLLM into a session.

Improved User Management

This release makes it easier to manage users and keys on LiteLLM. You can now search and filter across users, keys, teams, and models, and control user settings more easily.

New features include:

Search for users by email, ID, role, or team.
See all of a user's models, teams, and keys in one place.
Change user roles and model access right from the Users Tab.

These changes help you spend less time on user setup and management on LiteLLM.

Responses API Load Balancing

This release introduces load balancing for the Responses API, allowing you to route requests across provider regions and ensure session continuity. It works as follows:

If a previous_response_id is provided, LiteLLM will route the request to the original deployment that generated the prior response — ensuring session continuity.
If no previous_response_id is provided, LiteLLM will load-balance requests across your available deployments.

UI Session Logs

This release allow you to group requests to LiteLLM proxy into a session. If you specify a litellm_session_id in your request LiteLLM will automatically group all logs in the same session. This allows you to easily track usage and request content per session.

OpenAI
1. Added gpt-image-1 cost tracking Get Started
2. Bug fix: added cost tracking for gpt-image-1 when quality is unspecified PR
Azure
1. Fixed timestamp granularities passing to whisper in Azure Get Started
2. Added azure/gpt-image-1 pricing Get Started, PR
3. Added cost tracking for azure/computer-use-preview, azure/gpt-4o-audio-preview-2024-12-17, azure/gpt-4o-mini-audio-preview-2024-12-17 PR
Bedrock
1. Added support for all compatible Bedrock parameters when model="arn:.." (Bedrock application inference profile models) Get started, PR
2. Fixed wrong system prompt transformation PR
VertexAI / Google AI Studio
1. Allow setting budget_tokens=0 for gemini-2.5-flash Get Started,PR
2. Ensure returned usage includes thinking token usage PR
3. Added cost tracking for gemini-2.5-pro-preview-03-25 PR
Cohere
1. Added support for cohere command-a-03-2025 Get Started, PR
SageMaker
1. Added support for max_completion_tokens parameter Get Started, PR
Responses API
1. Added support for GET and DELETE operations - /v1/responses/{response_id} Get Started
2. Added session management support for all supported models PR
3. Added routing affinity to maintain model consistency within sessions Get Started, PR

Spend Tracking Improvements

Bug Fix: Fixed spend tracking bug, ensuring default litellm params aren't modified in memory PR
Deprecation Dates: Added deprecation dates for Azure, VertexAI models PR

Management Endpoints / UI

Users

Filtering and Searching:
- Filter users by user_id, role, team, sso_id
- Search users by email
User Info Panel: Added a new user information pane PR
- View teams, keys, models associated with User
- Edit user role, model permissions

Teams

Filtering and Searching:
- Filter teams by Organization, Team ID PR
- Search teams by Team Name PR

Keys

Key Management:
- Support for cross-filtering and filtering by key hash PR
- Fixed key alias reset when resetting filters PR
- Fixed table rendering on key creation PR

UI Logs Page

Session Logs: Added UI Session Logs Get Started

UI Authentication & Security

Required Authentication: Authentication now required for all dashboard pages PR
SSO Fixes: Fixed SSO user login invalid token error PR
[BETA] Encrypted Tokens: Moved UI to encrypted token usage PR
Token Expiry: Support token refresh by re-routing to login page (fixes issue where expired token would show a blank page) PR

UI General fixes

Fixed UI Flicker: Addressed UI flickering issues in Dashboard PR
Improved Terminology: Better loading and no-data states on Keys and Tools pages PR
Azure Model Support: Fixed editing Azure public model names and changing model names after creation PR
Team Model Selector: Bug fix for team model selection PR

Logging / Guardrail Integrations

Datadog:
1. Fixed Datadog LLM observability logging Get Started, PR
Prometheus / Grafana:
1. Enable datasource selection on LiteLLM Grafana Template Get Started, PR
AgentOps:
1. Added AgentOps Integration Get Started, PR
Arize:
1. Added missing attributes for Arize & Phoenix Integration Get Started, PR

General Proxy Improvements

Caching: Fixed caching to account for thinking or reasoning_effort when calculating cache key PR
Model Groups: Fixed handling for cases where user sets model_group inside model_info PR
Passthrough Endpoints: Ensured PassthroughStandardLoggingPayload is logged with method, URL, request/response body PR
Fix SQL Injection: Fixed potential SQL injection vulnerability in spend_management_endpoints.py PR

Helm

Fixed serviceAccountName on migration job PR

Full Changelog

The complete list of changes can be found in the GitHub release notes.

v1.67.0-stable - SCIM Integration

April 19, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

Key Highlights

SCIM Integration: Enables identity providers (Okta, Azure AD, OneLogin, etc.) to automate user and team (group) provisioning, updates, and deprovisioning
Team and Tag based usage tracking: You can now see usage and spend by team and tag at 1M+ spend logs.
Unified Responses API: Support for calling Anthropic, Gemini, Groq, etc. via OpenAI's new Responses API.

Let's dive in.

SCIM Integration

This release adds SCIM support to LiteLLM. This allows your SSO provider (Okta, Azure AD, etc) to automatically create/delete users, teams, and memberships on LiteLLM. This means that when you remove a team on your SSO provider, your SSO provider will automatically delete the corresponding team on LiteLLM.

Team and Tag based usage tracking

This release improves team and tag based usage tracking at 1m+ spend logs, making it easy to monitor your LLM API Spend in production. This covers:

View daily spend by teams + tags
View usage / spend by key, within teams
View spend by multiple tags
Allow internal users to view spend of teams they're a member of

Unified Responses API

This release allows you to call Azure OpenAI, Anthropic, AWS Bedrock, and Google Vertex AI models via the POST /v1/responses endpoint on LiteLLM. This means you can now use popular tools like OpenAI Codex with your own models.

OpenAI
1. gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, o3, o3-mini, o4-mini pricing - Get Started, PR
2. o4 - correctly map o4 to openai o_series model
Azure AI
1. Phi-4 output cost per token fix - PR
2. Responses API support Get Started,PR
Anthropic
1. redacted message thinking support - Get Started,PR
Cohere
1. /v2/chat Passthrough endpoint support w/ cost tracking - Get Started, PR
Azure
1. Support azure tenant_id/client_id env vars - Get Started, PR
2. Fix response_format check for 2025+ api versions - PR
3. Add gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, o3, o3-mini, o4-mini pricing
VLLM
1. Files - Support 'file' message type for VLLM video url's - Get Started, PR
2. Passthrough - new /vllm/ passthrough endpoint support Get Started, PR
Mistral
1. new /mistral passthrough endpoint support Get Started, PR
AWS
1. New mapped bedrock regions - PR
VertexAI / Google AI Studio
1. Gemini - Response format - Retain schema field ordering for google gemini and vertex by specifying propertyOrdering - Get Started, PR
2. Gemini-2.5-flash - return reasoning content Google AI Studio, Vertex AI
3. Gemini-2.5-flash - pricing + model information PR
4. Passthrough - new /vertex_ai/discovery route - enables calling AgentBuilder API routes Get Started, PR
Fireworks AI
1. return tool calling responses in tool_calls field (fireworks incorrectly returns this as a json str in content) PR
Triton
1. Remove fixed remove bad_words / stop words from /generate call - Get Started, PR
Other
1. Support for all litellm providers on Responses API (works with Codex) - Get Started, PR
2. Fix combining multiple tool calls in streaming response - Get Started, PR

Spend Tracking Improvements

Cost Control - inject cache control points in prompt for cost reduction Get Started, PR
Spend Tags - spend tags in headers - support x-litellm-tags even if tag based routing not enabled Get Started, PR
Gemini-2.5-flash - support cost calculation for reasoning tokens PR

Management Endpoints / UI

Users
1. Show created_at and updated_at on users page - PR
Virtual Keys
1. Filter by key alias - https://github.com/BerriAI/litellm/pull/10085
Usage Tab
1. Team based usage
  - New LiteLLM_DailyTeamSpend Table for aggregate team based usage logging - PR
  - New Team based usage dashboard + new /team/daily/activity API - PR
  - Return team alias on /team/daily/activity API - PR
  - allow internal user view spend for teams they belong to - PR
  - allow viewing top keys by team - PR
2. Tag Based Usage
  - New LiteLLM_DailyTagSpend Table for aggregate tag based usage logging - PR
  - Restrict to only Proxy Admins - PR
  - allow viewing top keys by tag
  - Return tags passed in request (i.e. dynamic tags) on /tag/list API - PR
3. Track prompt caching metrics in daily user, team, tag tables - PR
4. Show usage by key (on all up, team, and tag usage dashboards) - PR
5. swap old usage with new usage tab
Models
1. Make columns resizable/hideable - PR
API Playground
1. Allow internal user to call api playground - PR
SCIM
1. Add LiteLLM SCIM Integration for Team and User management - Get Started, PR

Logging / Guardrail Integrations

GCS
1. Fix gcs pub sub logging with env var GCS_PROJECT_ID - Get Started, PR
AIM
1. Add litellm call id passing to Aim guardrails on pre and post-hooks calls - Get Started, PR
Azure blob storage
1. Ensure logging works in high throughput scenarios - Get Started, PR

General Proxy Improvements

Support setting litellm.modify_params via env var PR
Model Discovery - Check provider’s /models endpoints when calling proxy’s /v1/models endpoint - Get Started, PR
/utils/token_counter - fix retrieving custom tokenizer for db models - Get Started, PR
Prisma migrate - handle existing columns in db table - PR

v1.66.0-stable - Realtime API Cost Tracking

April 12, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run
-e STORE_MODEL_IN_DB=True
-p 4000:4000
ghcr.io/berriai/litellm:main-v1.66.0-stable

pip install litellm
pip install litellm==1.66.0.post1

v1.66.0-stable is live now, here are the key highlights of this release

Key Highlights

Realtime API Cost Tracking: Track cost of realtime API calls
Microsoft SSO Auto-sync: Auto-sync groups and group members from Azure Entra ID to LiteLLM
xAI grok-3: Added support for xai/grok-3 models
Security Fixes: Fixed CVE-2025-0330 and CVE-2024-6825 vulnerabilities

Let's dive in.

Realtime API Cost Tracking

This release adds Realtime API logging + cost tracking.

Logging: LiteLLM now logs the complete response from realtime calls to all logging integrations (DB, S3, Langfuse, etc.)
Cost Tracking: You can now set 'base_model' and custom pricing for realtime models. Custom Pricing
Budgets: Your key/user/team budgets now work for realtime models as well.

Start here

Microsoft SSO Auto-sync

Auto-sync groups and members from Azure Entra ID to LiteLLM

This release adds support for auto-syncing groups and members on Microsoft Entra ID with LiteLLM. This means that LiteLLM proxy administrators can spend less time managing teams and members and LiteLLM handles the following:

Auto-create teams that exist on Microsoft Entra ID
Sync team members on Microsoft Entra ID with LiteLLM teams

Get started with this here

New Models / Updated Models

xAI
1. Added reasoning_effort support for xai/grok-3-mini-beta Get Started
2. Added cost tracking for xai/grok-3 models PR
Hugging Face
1. Added inference providers support Get Started
Azure
1. Added azure/gpt-4o-realtime-audio cost tracking PR
VertexAI
1. Added enterpriseWebSearch tool support Get Started
2. Moved to only passing keys accepted by the Vertex AI response schema PR
Google AI Studio
1. Added cost tracking for gemini-2.5-pro PR
2. Fixed pricing for 'gemini/gemini-2.5-pro-preview-03-25' PR
3. Fixed handling file_data being passed in PR
Azure
1. Updated Azure Phi-4 pricing PR
2. Added azure/gpt-4o-realtime-audio cost tracking PR
Databricks
1. Removed reasoning_effort from parameters PR
2. Fixed custom endpoint check for Databricks PR
General
1. Added litellm.supports_reasoning() util to track if an llm supports reasoning Get Started
2. Function Calling - Handle pydantic base model in message tool calls, handle tools = [], and support fake streaming on tool calls for meta.llama3-3-70b-instruct-v1:0 PR
3. LiteLLM Proxy - Allow passing thinking param to litellm proxy via client sdk PR
4. Fixed correctly translating 'thinking' param for litellm PR

Spend Tracking Improvements

OpenAI, Azure
1. Realtime API Cost tracking with token usage metrics in spend logs Get Started
Anthropic
1. Fixed Claude Haiku cache read pricing per token PR
2. Added cost tracking for Claude responses with base_model PR
3. Fixed Anthropic prompt caching cost calculation and trimmed logged message in db PR
General
1. Added token tracking and log usage object in spend logs PR
2. Handle custom pricing at deployment level PR

Management Endpoints / UI

Test Key Tab
1. Added rendering of Reasoning content, ttft, usage metrics on test key page PR
View input, output, reasoning tokens, ttft metrics.
Tag / Policy Management
1. Added Tag/Policy Management. Create routing rules based on request metadata. This allows you to enforce that requests with tags="private" only go to specific models. Get Started
Create and manage tags.
Redesigned Login Screen
1. Polished login screen PR
Microsoft SSO Auto-Sync
1. Added debug route to allow admins to debug SSO JWT fields PR
2. Added ability to use MSFT Graph API to assign users to teams PR
3. Connected litellm to Azure Entra ID Enterprise Application PR
4. Added ability for admins to set default_team_params for when litellm SSO creates default teams PR
5. Fixed MSFT SSO to use correct field for user email PR
6. Added UI support for setting Default Team setting when litellm SSO auto creates teams PR
UI Bug Fixes
1. Prevented team, key, org, model numerical values changing on scrolling PR
2. Instantly reflect key and team updates in UI PR

Logging / Guardrail Improvements

Prometheus
1. Emit Key and Team Budget metrics on a cron job schedule Get Started

Security Fixes

Fixed CVE-2025-0330 - Leakage of Langfuse API keys in team exception handling PR
Fixed CVE-2024-6825 - Remote code execution in post call rules PR

Helm

Added service annotations to litellm-helm chart PR
Added extraEnvVars to the helm deployment PR

Demo

Try this on the demo instance today

Complete Git Diff

See the complete git diff since v1.65.4-stable, here

v1.65.4-stable

April 5, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run
-e STORE_MODEL_IN_DB=True
-p 4000:4000
ghcr.io/berriai/litellm:main-v1.65.4-stable

pip install litellm
pip install litellm==1.65.4.post1

v1.65.4-stable is live. Here are the improvements since v1.65.0-stable.

Key Highlights

Preventing DB Deadlocks: Fixes a high-traffic issue when multiple instances were writing to the DB at the same time.
New Usage Tab: Enables viewing spend by model and customizing date range

Let's dive in.

Preventing DB Deadlocks

This release fixes the DB deadlocking issue that users faced in high traffic (10K+ RPS). This is great because it enables user/key/team spend tracking works at that scale.

Read more about the new architecture here

New Usage Tab

The new Usage tab now brings the ability to track daily spend by model. This makes it easier to catch any spend tracking or token counting errors, when combined with the ability to view successful requests, and token usage.

To test this out, just go to Experimental > New Usage > Activity.

New Models / Updated Models

Databricks - claude-3-7-sonnet cost tracking PR
VertexAI - gemini-2.5-pro-exp-03-25 cost tracking PR
VertexAI - gemini-2.0-flash cost tracking PR
Groq - add whisper ASR models to model cost map PR
IBM - Add watsonx/ibm/granite-3-8b-instruct to model cost map PR
Google AI Studio - add gemini/gemini-2.5-pro-preview-03-25 to model cost map PR

LLM Translation

Vertex AI - Support anyOf param for OpenAI json schema translation Get Started
Anthropic- response_format + thinking param support (works across Anthropic API, Bedrock, Vertex) Get Started
Anthropic - if thinking token is specified and max tokens is not - ensure max token to anthropic is higher than thinking tokens (works across Anthropic API, Bedrock, Vertex) PR
Bedrock - latency optimized inference support Get Started
Sagemaker - handle special tokens + multibyte character code in response Get Started
MCP - add support for using SSE MCP servers Get Started
Anthropic - new litellm.messages.create interface for calling Anthropic /v1/messages via passthrough Get Started
Anthropic - support ‘file’ content type in message param (works across Anthropic API, Bedrock, Vertex) Get Started
Anthropic - map openai 'reasoning_effort' to anthropic 'thinking' param (works across Anthropic API, Bedrock, Vertex) Get Started
Google AI Studio (Gemini) - [BETA] /v1/files upload support Get Started
Azure - fix o-series tool calling Get Started
Unified file id - [ALPHA] allow calling multiple providers with same file id PR
- This is experimental, and not recommended for production use.
- We plan to have a production-ready implementation by next week.
Google AI Studio (Gemini) - return logprobs PR
Anthropic - Support prompt caching for Anthropic tool calls Get Started
OpenRouter - unwrap extra body on open router calls PR
VertexAI - fix credential caching issue PR
XAI - filter out 'name' param for XAI PR
Gemini - image generation output support Get Started
Databricks - support claude-3-7-sonnet w/ thinking + response_format Get Started

Spend Tracking Improvements

Reliability fix - Check sent and received model for cost calculation PR
Vertex AI - Multimodal embedding cost tracking Get Started, PR

Management Endpoints / UI

New Usage Tab
- Report 'total_tokens' + report success/failure calls
- Remove double bars on scroll
- Ensure ‘daily spend’ chart ordered from earliest to latest date
- showing spend per model per day
- show key alias on usage tab
- Allow non-admins to view their activity
- Add date picker to new usage tab
Virtual Keys Tab
- remove 'default key' on user signup
- fix showing user models available for personal key creation
Test Key Tab
- Allow testing image generation models
Models Tab
- Fix bulk adding models
- support reusable credentials for passthrough endpoints
- Allow team members to see team models
Teams Tab
- Fix json serialization error on update team metadata
Request Logs Tab
- Add reasoning_content token tracking across all providers on streaming
API
- return key alias on /user/daily/activity Get Started
SSO
- Allow assigning SSO users to teams on MSFT SSO PR

Logging / Guardrail Integrations

Console Logs - Add json formatting for uncaught exceptions PR
Guardrails - AIM Guardrails support for virtual key based policies Get Started
Logging - fix completion start time tracking PR
Prometheus
- Allow adding authentication on Prometheus /metrics endpoints PR
- Distinguish LLM Provider Exception vs. LiteLLM Exception in metric naming PR
- Emit operational metrics for new DB Transaction architecture PR

Performance / Loadbalancing / Reliability improvements

Preventing Deadlocks
- Reduce DB Deadlocks by storing spend updates in Redis and then committing to DB PR
- Ensure no deadlocks occur when updating DailyUserSpendTransaction PR
- High Traffic fix - ensure new DB + Redis architecture accurately tracks spend PR
- Use Redis for PodLock Manager instead of PG (ensures no deadlocks occur) PR
- v2 DB Deadlock Reduction Architecture – Add Max Size for In-Memory Queue + Backpressure Mechanism PR
Prisma Migrations Get Started
- connects litellm proxy to litellm's prisma migration files
- Handle db schema updates from new litellm-proxy-extras sdk
Redis - support password for sync sentinel clients PR
Fix "Circular reference detected" error when max_parallel_requests = 0 PR
Code QA - Ban hardcoded numbers PR

Helm

fix: wrong indentation of ttlSecondsAfterFinished in chart PR

General Proxy Improvements

Fix - only apply service_account_settings.enforced_params on service accounts PR
Fix - handle metadata null on /chat/completion PR
Fix - Move daily user transaction logging outside of 'disable_spend_logs' flag, as they’re unrelated PR

Demo

Try this on the demo instance today

Complete Git Diff

See the complete git diff since v1.65.0-stable, here

v1.65.0-stable - Model Context Protocol

March 30, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

v1.65.0-stable is live now. Here are the key highlights of this release:

MCP Support: Support for adding and using MCP servers on the LiteLLM proxy.
UI view total usage after 1M+ logs: You can now view usage analytics after crossing 1M+ logs in DB.

Model Context Protocol (MCP)

This release introduces support for centrally adding MCP servers on LiteLLM. This allows you to add MCP server endpoints and your developers can list and call MCP tools through LiteLLM.

UI view total usage after 1M+ logs

This release brings the ability to view total usage analytics even after exceeding 1M+ logs in your database. We've implemented a scalable architecture that stores only aggregate usage data, resulting in significantly more efficient queries and reduced database CPU utilization.

View total usage after 1M+ logs

How this works:
- We now aggregate usage data into a dedicated DailyUserSpend table, significantly reducing query load and CPU usage even beyond 1M+ logs.

Daily Spend Breakdown API:

Retrieve granular daily usage data (by model, provider, and API key) with a single endpoint. Example Request:

Daily Spend Breakdown API
curl -L -X GET 'http://localhost:4000/user/daily/activity?start_date=2025-03-20&end_date=2025-03-27' \
-H 'Authorization: Bearer sk-...'

Daily Spend Breakdown API Response
{
    "results": [
        {
            "date": "2025-03-27",
            "metrics": {
                "spend": 0.0177072,
                "prompt_tokens": 111,
                "completion_tokens": 1711,
                "total_tokens": 1822,
                "api_requests": 11
            },
            "breakdown": {
                "models": {
                    "gpt-4o-mini": {
                        "spend": 1.095e-05,
                        "prompt_tokens": 37,
                        "completion_tokens": 9,
                        "total_tokens": 46,
                        "api_requests": 1
                },
                "providers": { "openai": { ... }, "azure_ai": { ... } },
                "api_keys": { "3126b6eaf1...": { ... } }
            }
        }
    ],
    "metadata": {
        "total_spend": 0.7274667,
        "total_prompt_tokens": 280990,
        "total_completion_tokens": 376674,
        "total_api_requests": 14
    }
}

New Models / Updated Models

Support for Vertex AI gemini-2.0-flash-lite & Google AI Studio gemini-2.0-flash-lite PR
Support for Vertex AI Fine-Tuned LLMs PR
Nova Canvas image generation support PR
OpenAI gpt-4o-transcribe support PR
Added new Vertex AI text embedding model PR

LLM Translation

OpenAI Web Search Tool Call Support PR
Vertex AI topLogprobs support PR
Support for sending images and video to Vertex AI multimodal embedding Doc
Support litellm.api_base for Vertex AI + Gemini across completion, embedding, image_generation PR
Bug fix for returning response_cost when using litellm python SDK with LiteLLM Proxy PR
Support for max_completion_tokens on Mistral API PR
Refactored Vertex AI passthrough routes - fixes unpredictable behaviour with auto-setting default_vertex_region on router model add PR

Spend Tracking Improvements

Log 'api_base' on spend logs PR
Support for Gemini audio token cost tracking PR
Fixed OpenAI audio input token cost tracking PR

UI

Model Management

Allowed team admins to add/update/delete models on UI PR
Added render supports_web_search on model hub PR

Request Logs

Show API base and model ID on request logs PR
Allow viewing keyinfo on request logs PR

Usage Tab

Added Daily User Spend Aggregate view - allows UI Usage tab to work > 1m rows PR
Connected UI to "LiteLLM_DailyUserSpend" spend table PR

Logging Integrations

Fixed StandardLoggingPayload for GCS Pub Sub Logging Integration PR
Track litellm_model_name on StandardLoggingPayload Docs

Performance / Reliability Improvements

LiteLLM Redis semantic caching implementation PR
Gracefully handle exceptions when DB is having an outage PR
Allow Pods to startup + passing /health/readiness when allow_requests_on_db_unavailable: True and DB is down PR

General Improvements

Support for exposing MCP tools on litellm proxy PR
Support discovering Gemini, Anthropic, xAI models by calling their /v1/model endpoint PR
Fixed route check for non-proxy admins on JWT auth PR
Added baseline Prisma database migrations PR
View all wildcard models on /model/info PR

Security

Bumped next from 14.2.21 to 14.2.25 in UI dashboard PR

Complete Git Diff

Here's the complete git diff

v1.65.0 - Team Model Add - update

March 28, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

v1.65.0 updates the /model/new endpoint to prevent non-team admins from creating team models.

This means that only proxy admins or team admins can create team models.

Additional Changes

Allows team admins to call /model/update to update team models.
Allows team admins to call /model/delete to delete team models.
Introduces new user_models_only param to /v2/model/info - only return models added by this user.

These changes enable team admins to add and manage models for their team on the LiteLLM UI + API.

v1.63.14-stable

March 22, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

These are the changes since v1.63.11-stable.

This release brings:

LLM Translation Improvements (MCP Support and Bedrock Application Profiles)
Perf improvements for Usage-based Routing
Streaming guardrail support via websockets
Azure OpenAI client perf fix (from previous release)

Docker Run LiteLLM Proxy

docker run
-e STORE_MODEL_IN_DB=True
-p 4000:4000
ghcr.io/berriai/litellm:main-v1.63.14-stable.patch1

Demo Instance

Here's a Demo Instance to test changes:

Instance: https://demo.litellm.ai/
Login Credentials:
- Username: admin
- Password: sk-1234

New Models / Updated Models

Azure gpt-4o - fixed pricing to latest global pricing - PR
O1-Pro - add pricing + model information - PR
Azure AI - mistral 3.1 small pricing added - PR
Azure - gpt-4.5-preview pricing added - PR

LLM Translation

New LLM Features

Bedrock: Support bedrock application inference profiles Docs
- Infer aws region from bedrock application profile id - (arn:aws:bedrock:us-east-1:...)
Ollama - support calling via /v1/completions Get Started
Bedrock - support us.deepseek.r1-v1:0 model name Docs
OpenRouter - OPENROUTER_API_BASE env var support Docs
Azure - add audio model parameter support - Docs
OpenAI - PDF File support Docs
OpenAI - o1-pro Responses API streaming support Docs
[BETA] MCP - Use MCP Tools with LiteLLM SDK Docs

Bug Fixes

Voyage: prompt token on embedding tracking fix - PR
Sagemaker - Fix ‘Too little data for declared Content-Length’ error - PR
OpenAI-compatible models - fix issue when calling openai-compatible models w/ custom_llm_provider set - PR
VertexAI - Embedding ‘outputDimensionality’ support - PR
Anthropic - return consistent json response format on streaming/non-streaming - PR

Spend Tracking Improvements

litellm_proxy/ - support reading litellm response cost header from proxy, when using client sdk
Reset Budget Job - fix budget reset error on keys/teams/users PR
Streaming - Prevents final chunk w/ usage from being ignored (impacted bedrock streaming + cost tracking) PR

UI

Users Page
- Feature: Control default internal user settings PR
Icons:
- Feature: Replace external "artificialanalysis.ai" icons by local svg PR
Sign In/Sign Out
- Fix: Default login when default_user_id user does not exist in DB PR

Logging Integrations

Support post-call guardrails for streaming responses Get Started
Arize Get Started
- fix invalid package import PR
- migrate to using standardloggingpayload for metadata, ensures spans land successfully PR
- fix logging to just log the LLM I/O PR
- Dynamic API Key/Space param support Get Started
StandardLoggingPayload - Log litellm_model_name in payload. Allows knowing what the model sent to API provider was Get Started
Prompt Management - Allow building custom prompt management integration Get Started

Performance / Reliability improvements

Redis Caching - add 5s default timeout, prevents hanging redis connection from impacting llm calls PR
Allow disabling all spend updates / writes to DB - patch to allow disabling all spend updates to DB with a flag PR
Azure OpenAI - correctly re-use azure openai client, fixes perf issue from previous Stable release PR
Azure OpenAI - uses litellm.ssl_verify on Azure/OpenAI clients PR
Usage-based routing - Wildcard model support Get Started
Usage-based routing - Support batch writing increments to redis - reduces latency to same as ‘simple-shuffle’ PR
Router - show reason for model cooldown on ‘no healthy deployments available error’ PR
Caching - add max value limit to an item in in-memory cache (1MB) - prevents OOM errors on large image url’s being sent through proxy PR

General Improvements

Passthrough Endpoints - support returning api-base on pass-through endpoints Response Headers Docs
SSL - support reading ssl security level from env var - Allows user to specify lower security settings Get Started
Credentials - only poll Credentials table when STORE_MODEL_IN_DB is True PR
Image URL Handling - new architecture doc on image url handling Docs
OpenAI - bump to pip install "openai==1.68.2" PR
Gunicorn - security fix - bump gunicorn==23.0.0 PR

Complete Git Diff

Here's the complete git diff

v1.63.11-stable

March 15, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

These are the changes since v1.63.2-stable.

This release is primarily focused on:

[Beta] Responses API Support
Snowflake Cortex Support, Amazon Nova Image Generation
UI - Credential Management, re-use credentials when adding new models
UI - Test Connection to LLM Provider before adding a model

Known Issues

🚨 Known issue on Azure OpenAI - We don't recommend upgrading if you use Azure OpenAI. This version failed our Azure OpenAI load test

Docker Run LiteLLM Proxy

docker run
-e STORE_MODEL_IN_DB=True
-p 4000:4000
ghcr.io/berriai/litellm:main-v1.63.11-stable

Demo Instance

Here's a Demo Instance to test changes:

Instance: https://demo.litellm.ai/
Login Credentials:
- Username: admin
- Password: sk-1234

New Models / Updated Models

Image Generation support for Amazon Nova Canvas Getting Started
Add pricing for Jamba new models PR
Add pricing for Amazon EU models PR
Add Bedrock Deepseek R1 model pricing PR
Update Gemini pricing: Gemma 3, Flash 2 thinking update, LearnLM PR
Mark Cohere Embedding 3 models as Multimodal PR
Add Azure Data Zone pricing PR
- LiteLLM Tracks cost for azure/eu and azure/us models

LLM Translation

New Endpoints

[Beta] POST /responses API. Getting Started

New LLM Providers

Snowflake Cortex Getting Started

New LLM Features

Support OpenRouter reasoning_content on streaming Getting Started

Bug Fixes

OpenAI: Return code, param and type on bad request error More information on litellm exceptions
Bedrock: Fix converse chunk parsing to only return empty dict on tool use PR
Bedrock: Support extra_headers PR
Azure: Fix Function Calling Bug & Update Default API Version to 2025-02-01-preview PR
Azure: Fix AI services URL PR
Vertex AI: Handle HTTP 201 status code in response PR
Perplexity: Fix incorrect streaming response PR
Triton: Fix streaming completions bug PR
Deepgram: Support bytes.IO when handling audio files for transcription PR
Ollama: Fix "system" role has become unacceptable PR
All Providers (Streaming): Fix String data: stripped from entire content in streamed responses PR

Spend Tracking Improvements

Support Bedrock converse cache token tracking Getting Started
Cost Tracking for Responses API Getting Started
Fix Azure Whisper cost tracking Getting Started

UI

Re-Use Credentials on UI

You can now onboard LLM provider credentials on LiteLLM UI. Once these credentials are added you can re-use them when adding new models Getting Started

Test Connections before adding models

Before adding a model you can test the connection to the LLM provider to verify you have setup your API Base + API Key correctly

General UI Improvements

Add Models Page
- Allow adding Cerebras, Sambanova, Perplexity, Fireworks, Openrouter, TogetherAI Models, Text-Completion OpenAI on Admin UI
- Allow adding EU OpenAI models
- Fix: Instantly show edit + deletes to models
Keys Page
- Fix: Instantly show newly created keys on Admin UI (don't require refresh)
- Fix: Allow clicking into Top Keys when showing users Top API Key
- Fix: Allow Filter Keys by Team Alias, Key Alias and Org
- UI Improvements: Show 100 Keys Per Page, Use full height, increase width of key alias
Users Page
- Fix: Show correct count of internal user keys on Users Page
- Fix: Metadata not updating in Team UI
Logs Page
- UI Improvements: Keep expanded log in focus on LiteLLM UI
- UI Improvements: Minor improvements to logs page
- Fix: Allow internal user to query their own logs
- Allow switching off storing Error Logs in DB Getting Started
Sign In/Sign Out
- Fix: Correctly use PROXY_LOGOUT_URL when set Getting Started

Security

Support for Rotating Master Keys Getting Started
Fix: Internal User Viewer Permissions, don't allow internal_user_viewer role to see Test Key Page or Create Key Button More information on role based access controls
Emit audit logs on All user + model Create/Update/Delete endpoints Getting Started
JWT
- Support multiple JWT OIDC providers Getting Started
- Fix JWT access with Groups not working when team is assigned All Proxy Models access
Using K/V pairs in 1 AWS Secret Getting Started

Logging Integrations

Prometheus: Track Azure LLM API latency metric Getting Started
Athina: Added tags, user_feedback and model_options to additional_keys which can be sent to Athina Getting Started

Performance / Reliability improvements

Redis + litellm router - Fix Redis cluster mode for litellm router PR

General Improvements

OpenWebUI Integration - display thinking tokens

Guide on getting started with LiteLLM x OpenWebUI. Getting Started
Display thinking tokens on OpenWebUI (Bedrock, Anthropic, Deepseek) Getting Started

Complete Git Diff

Here's the complete git diff

v1.63.2-stable

March 8, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

These are the changes since v1.61.20-stable.

This release is primarily focused on:

LLM Translation improvements (more thinking content improvements)
UI improvements (Error logs now shown on UI)

info

This release will be live on 03/09/2025

Demo Instance

Here's a Demo Instance to test changes:

Instance: https://demo.litellm.ai/
Login Credentials:
- Username: admin
- Password: sk-1234

New Models / Updated Models

Add supports_pdf_input for specific Bedrock Claude models PR
Add pricing for amazon eu models PR
Fix Azure O1 mini pricing PR

LLM Translation

Support /openai/ passthrough for Assistant endpoints. Get Started
Bedrock Claude - fix tool calling transformation on invoke route. Get Started
Bedrock Claude - response_format support for claude on invoke route. Get Started
Bedrock - pass description if set in response_format. Get Started
Bedrock - Fix passing response_format: {"type": "text"}. PR
OpenAI - Handle sending image_url as str to openai. Get Started
Deepseek - return 'reasoning_content' missing on streaming. Get Started
Caching - Support caching on reasoning content. Get Started
Bedrock - handle thinking blocks in assistant message. Get Started
Anthropic - Return signature on streaming. Get Started

Note: We've also migrated from signature_delta to signature. Read more

Support format param for specifying image type. Get Started
Anthropic - /v1/messages endpoint - thinking param support. Get Started

Note: this refactors the [BETA] unified /v1/messages endpoint, to just work for the Anthropic API.

Vertex AI - handle $id in response schema when calling vertex ai. Get Started

Spend Tracking Improvements

Batches API - Fix cost calculation to run on retrieve_batch. Get Started
Batches API - Log batch models in spend logs / standard logging payload. Get Started

Management Endpoints / UI

Virtual Keys Page
- Allow team/org filters to be searchable on the Create Key Page
- Add created_by and updated_by fields to Keys table
- Show 'user_email' on key table
- Show 100 Keys Per Page, Use full height, increase width of key alias
Logs Page
- Show Error Logs on LiteLLM UI
- Allow Internal Users to View their own logs
Internal Users Page
- Allow admin to control default model access for internal users
Fix session handling with cookies

Logging / Guardrail Integrations

Fix prometheus metrics w/ custom metrics, when keys containing team_id make requests. PR

Performance / Loadbalancing / Reliability improvements

Cooldowns - Support cooldowns on models called with client side credentials. Get Started
Tag-based Routing - ensures tag-based routing across all endpoints (/embeddings, /image_generation, etc.). Get Started

General Proxy Improvements

Raise BadRequestError when unknown model passed in request
Enforce model access restrictions on Azure OpenAI proxy route
Reliability fix - Handle emoji’s in text - fix orjson error
Model Access Patch - don't overwrite litellm.anthropic_models when running auth checks
Enable setting timezone information in docker image

Complete Git Diff

Here's the complete git diff

v1.63.0 - Anthropic 'thinking' response update

March 5, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

v1.63.0 fixes Anthropic 'thinking' response on streaming to return the signature block. Github Issue

It also moves the response structure from signature_delta to signature to be the same as Anthropic. Anthropic Docs

Diff

"message": {
    ...
    "reasoning_content": "The capital of France is Paris.",
    "thinking_blocks": [
        {
            "type": "thinking",
            "thinking": "The capital of France is Paris.",
-            "signature_delta": "EqoBCkgIARABGAIiQL2UoU0b1OHYi+..." # 👈 OLD FORMAT
+            "signature": "EqoBCkgIARABGAIiQL2UoU0b1OHYi+..." # 👈 KEY CHANGE
        }
    ]
}

v1.61.20-stable

March 1, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

These are the changes since v1.61.13-stable.

This release is primarily focused on:

LLM Translation improvements (claude-3-7-sonnet + 'thinking'/'reasoning_content' support)
UI improvements (add model flow, user management, etc)

Demo Instance

Here's a Demo Instance to test changes:

Instance: https://demo.litellm.ai/
Login Credentials:
- Username: admin
- Password: sk-1234

New Models / Updated Models

Anthropic 3-7 sonnet support + cost tracking (Anthropic API + Bedrock + Vertex AI + OpenRouter)
1. Anthropic API Start here
2. Bedrock API Start here
3. Vertex AI API See here
4. OpenRouter See here
Gpt-4.5-preview support + cost tracking See here
Azure AI - Phi-4 cost tracking See here
Claude-3.5-sonnet - vision support updated on Anthropic API See here
Bedrock llama vision support See here
Cerebras llama3.3-70b pricing See here

LLM Translation

Infinity Rerank - support returning documents when return_documents=True Start here
Amazon Deepseek - <think> param extraction into ‘reasoning_content’ Start here
Amazon Titan Embeddings - filter out ‘aws_’ params from request body Start here
Anthropic ‘thinking’ + ‘reasoning_content’ translation support (Anthropic API, Bedrock, Vertex AI) Start here
VLLM - support ‘video_url’ Start here
Call proxy via litellm SDK: Support litellm_proxy/ for embedding, image_generation, transcription, speech, rerank Start here
OpenAI Pass-through - allow using Assistants GET, DELETE on /openai pass through routes Start here
Message Translation - fix openai message for assistant msg if role is missing - openai allows this
O1/O3 - support ‘drop_params’ for o3-mini and o1 parallel_tool_calls param (not supported currently) See here

Spend Tracking Improvements

Cost tracking for rerank via Bedrock See PR
Anthropic pass-through - fix race condition causing cost to not be tracked See PR
Anthropic pass-through: Ensure accurate token counting See PR

Management Endpoints / UI

Models Page - Allow sorting models by ‘created at’
Models Page - Edit Model Flow Improvements
Models Page - Fix Adding Azure, Azure AI Studio models on UI
Internal Users Page - Allow Bulk Adding Internal Users on UI
Internal Users Page - Allow sorting users by ‘created at’
Virtual Keys Page - Allow searching for UserIDs on the dropdown when assigning a user to a team See PR
Virtual Keys Page - allow creating a user when assigning keys to users See PR
Model Hub Page - fix text overflow issue See PR
Admin Settings Page - Allow adding MSFT SSO on UI
Backend - don't allow creating duplicate internal users in DB

Helm

support ttlSecondsAfterFinished on the migration job - See PR
enhance migrations job with additional configurable properties - See PR

Logging / Guardrail Integrations

Arize Phoenix support
‘No-log’ - fix ‘no-log’ param support on embedding calls

Performance / Loadbalancing / Reliability improvements

Single Deployment Cooldown logic - Use allowed_fails or allowed_fail_policy if set Start here

General Proxy Improvements

Hypercorn - fix reading / parsing request body
Windows - fix running proxy in windows
DD-Trace - fix dd-trace enablement on proxy

Complete Git Diff

View the complete git diff here.

v1.59.8-stable

January 31, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

info

Get a 7 day free trial for LiteLLM Enterprise here.

no call needed

New Models / Updated Models

New OpenAI /image/variations endpoint BETA support Docs
Topaz API support on OpenAI /image/variations BETA endpoint Docs
Deepseek - r1 support w/ reasoning_content (Deepseek API, Vertex AI, Bedrock)
Azure - Add azure o1 pricing See Here
Anthropic - handle -latest tag in model for cost calculation
Gemini-2.0-flash-thinking - add model pricing (it’s 0.0) See Here
Bedrock - add stability sd3 model pricing See Here (s/o Marty Sullivan)
Bedrock - add us.amazon.nova-lite-v1:0 to model cost map See Here
TogetherAI - add new together_ai llama3.3 models See Here

LLM Translation

LM Studio -> fix async embedding call
Gpt 4o models - fix response_format translation
Bedrock nova - expand supported document types to include .md, .csv, etc. Start Here
Bedrock - docs on IAM role based access for bedrock - Start Here
Bedrock - cache IAM role credentials when used
Google AI Studio (gemini/) - support gemini 'frequency_penalty' and 'presence_penalty'
Azure O1 - fix model name check
WatsonX - ZenAPIKey support for WatsonX Docs
Ollama Chat - support json schema response format Start Here
Bedrock - return correct bedrock status code and error message if error during streaming
Anthropic - Supported nested json schema on anthropic calls
OpenAI - metadata param preview support
1. SDK - enable via litellm.enable_preview_features = True
2. PROXY - enable via litellm_settings::enable_preview_features: true
Replicate - retry completion response on status=processing

Spend Tracking Improvements

Bedrock - QA asserts all bedrock regional models have same supported_ as base model
Bedrock - fix bedrock converse cost tracking w/ region name specified
Spend Logs reliability fix - when user passed in request body is int instead of string
Ensure ‘base_model’ cost tracking works across all endpoints
Fixes for Image generation cost tracking
Anthropic - fix anthropic end user cost tracking
JWT / OIDC Auth - add end user id tracking from jwt auth

Management Endpoints / UI

allows team member to become admin post-add (ui + endpoints)
New edit/delete button for updating team membership on UI
If team admin - show all team keys
Model Hub - clarify cost of models is per 1m tokens
Invitation Links - fix invalid url generated
New - SpendLogs Table Viewer - allows proxy admin to view spend logs on UI
1. New spend logs - allow proxy admin to ‘opt in’ to logging request/response in spend logs table - enables easier abuse detection
2. Show country of origin in spend logs
3. Add pagination + filtering by key name/team name
/key/delete - allow team admin to delete team keys
Internal User ‘view’ - fix spend calculation when team selected
Model Analytics is now on Free
Usage page - shows days when spend = 0, and round spend on charts to 2 sig figs
Public Teams - allow admins to expose teams for new users to ‘join’ on UI - Start Here
Guardrails
1. set/edit guardrails on a virtual key
2. Allow setting guardrails on a team
3. Set guardrails on team create + edit page
Support temporary budget increases on /key/update - new temp_budget_increase and temp_budget_expiry fields - Start Here
Support writing new key alias to AWS Secret Manager - on key rotation Start Here

Helm

add securityContext and pull policy values to migration job (s/o https://github.com/Hexoplon)
allow specifying envVars on values.yaml
new helm lint test

Logging / Guardrail Integrations

Log the used prompt when prompt management used. Start Here
Support s3 logging with team alias prefixes - Start Here
Prometheus Start Here
1. fix litellm_llm_api_time_to_first_token_metric not populating for bedrock models
2. emit remaining team budget metric on regular basis (even when call isn’t made) - allows for more stable metrics on Grafana/etc.
3. add key and team level budget metrics
4. emit litellm_overhead_latency_metric
5. Emit litellm_team_budget_reset_at_metric and litellm_api_key_budget_remaining_hours_metric
Datadog - support logging spend tags to Datadog. Start Here
Langfuse - fix logging request tags, read from standard logging payload
GCS - don’t truncate payload on logging
New GCS Pub/Sub logging support Start Here
Add AIM Guardrails support Start Here

Security

New Enterprise SLA for patching security vulnerabilities. See Here
Hashicorp - support using vault namespace for TLS auth. Start Here
Azure - DefaultAzureCredential support

Health Checks

Cleanup pricing-only model names from wildcard route list - prevent bad health checks
Allow specifying a health check model for wildcard routes - https://docs.litellm.ai/docs/proxy/health#wildcard-routes
New ‘health_check_timeout ‘ param with default 1min upperbound to prevent bad model from health check to hang and cause pod restarts. Start Here
Datadog - add data dog service health check + expose new /health/services endpoint. Start Here

Performance / Reliability improvements

3x increase in RPS - moving to orjson for reading request body
LLM Routing speedup - using cached get model group info
SDK speedup - using cached get model info helper - reduces CPU work to get model info
Proxy speedup - only read request body 1 time per request
Infinite loop detection scripts added to codebase
Bedrock - pure async image transformation requests
Cooldowns - single deployment model group if 100% calls fail in high traffic - prevents an o1 outage from impacting other calls
Response Headers - return
1. x-litellm-timeout
2. x-litellm-attempted-retries
3. x-litellm-overhead-duration-ms
4. x-litellm-response-duration-ms
ensure duplicate callbacks are not added to proxy
Requirements.txt - bump certifi version

General Proxy Improvements

JWT / OIDC Auth - new enforce_rbac param,allows proxy admin to prevent any unmapped yet authenticated jwt tokens from calling proxy. Start Here
fix custom openapi schema generation for customized swagger’s
Request Headers - support reading x-litellm-timeout param from request headers. Enables model timeout control when using Vercel’s AI SDK + LiteLLM Proxy. Start Here
JWT / OIDC Auth - new role based permissions for model authentication. See Here

Complete Git Diff

This is the diff between v1.57.8-stable and v1.59.8-stable.

Use this to see the changes in the codebase.

Git Diff

v1.59.0

January 17, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

info

Get a 7 day free trial for LiteLLM Enterprise here.

no call needed

UI Improvements

[Opt In] Admin UI - view messages / responses

You can now view messages and response logs on Admin UI.

How to enable it - add store_prompts_in_spend_logs: true to your proxy_config.yaml

Once this flag is enabled, your messages and responses will be stored in the LiteLLM_Spend_Logs table.

general_settings:
  store_prompts_in_spend_logs: true

DB Schema Change

Added messages and responses to the LiteLLM_Spend_Logs table.

By default this is not logged. If you want messages and responses to be logged, you need to opt in with this setting

general_settings:
  store_prompts_in_spend_logs: true

v1.57.8-stable

January 11, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

alerting, prometheus, secret management, management endpoints, ui, prompt management, finetuning, batch

New / Updated Models

Mistral large pricing - https://github.com/BerriAI/litellm/pull/7452
Cohere command-r7b-12-2024 pricing - https://github.com/BerriAI/litellm/pull/7553/files
Voyage - new models, prices and context window information - https://github.com/BerriAI/litellm/pull/7472
Anthropic - bump Bedrock claude-3-5-haiku max_output_tokens to 8192

General Proxy Improvements

Health check support for realtime models
Support calling Azure realtime routes via virtual keys
Support custom tokenizer on /utils/token_counter - useful when checking token count for self-hosted models
Request Prioritization - support on /v1/completion endpoint as well

LLM Translation Improvements

Deepgram STT support. Start Here
OpenAI Moderations - omni-moderation-latest support. Start Here
Azure O1 - fake streaming support. This ensures if a stream=true is passed, the response is streamed. Start Here
Anthropic - non-whitespace char stop sequence handling - PR
Azure OpenAI - support Entra ID username + password based auth. Start Here
LM Studio - embedding route support. Start Here
WatsonX - ZenAPIKeyAuth support. Start Here

Prompt Management Improvements

Langfuse integration
HumanLoop integration
Support for using load balanced models
Support for loading optional params from prompt manager

Start Here

Finetuning + Batch APIs Improvements

Improved unified endpoint support for Vertex AI finetuning - PR
Add support for retrieving vertex api batch jobs - PR

NEW Alerting Integration

PagerDuty Alerting Integration.

Handles two types of alerts:

High LLM API Failure Rate. Configure X fails in Y seconds to trigger an alert.
High Number of Hanging LLM Requests. Configure X hangs in Y seconds to trigger an alert.

Start Here

Prometheus Improvements

Added support for tracking latency/spend/tokens based on custom metrics. Start Here

NEW Hashicorp Secret Manager Support

Support for reading credentials + writing LLM API keys. Start Here

Management Endpoints / UI Improvements

Create and view organizations + assign org admins on the Proxy UI
Support deleting keys by key_alias
Allow assigning teams to org on UI
Disable using ui session token for 'test key' pane
Show model used in 'test key' pane
Support markdown output in 'test key' pane

Helm Improvements

Prevent istio injection for db migrations cron job
allow using migrationJob.enabled variable within job

Logging Improvements

braintrust logging: respect project_id, add more metrics - https://github.com/BerriAI/litellm/pull/7613
Athina - support base url - ATHINA_BASE_URL
Lunary - Allow passing custom parent run id to LLM Calls

Git Diff

This is the diff between v1.56.3-stable and v1.57.8-stable.

Use this to see the changes in the codebase.

Git Diff

v1.57.7

January 10, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

langfuse, management endpoints, ui, prometheus, secret management

Langfuse Prompt Management

Langfuse Prompt Management is being labelled as BETA. This allows us to iterate quickly on the feedback we're receiving, and making the status clearer to users. We expect to make this feature to be stable by next month (February 2025).

Changes:

Include the client message in the LLM API Request. (Previously only the prompt template was sent, and the client message was ignored).
Log the prompt template in the logged request (e.g. to s3/langfuse).
Log the 'prompt_id' and 'prompt_variables' in the logged request (e.g. to s3/langfuse).

Start Here

Team/Organization Management + UI Improvements

Managing teams and organizations on the UI is now easier.

Changes:

Support for editing user role within team on UI.
Support updating team member role to admin via api - /team/member_update
Show team admins all keys for their team.
Add organizations with budgets
Assign teams to orgs on the UI
Auto-assign SSO users to teams

Start Here

Hashicorp Vault Support

We now support writing LiteLLM Virtual API keys to Hashicorp Vault.

Start Here

Custom Prometheus Metrics

Define custom prometheus metrics, and track usage/latency/no. of requests against them

This allows for more fine-grained tracking - e.g. on prompt template passed in request metadata

Start Here

v1.57.3 - New Base Docker Image

January 8, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

docker image, security, vulnerability

What changed?

LiteLLMBase image now uses cgr.dev/chainguard/python:latest-dev

Why the change?

To ensure there are 0 critical/high vulnerabilities on LiteLLM Docker Image

Migration Guide

If you use a custom dockerfile with litellm as a base image + apt-get

Instead of apt-get use apk, the base litellm image will no longer have apt-get installed.

You are only impacted if you use apt-get in your Dockerfile

# Use the provided base image
FROM ghcr.io/berriai/litellm:main-latest

# Set the working directory
WORKDIR /app

# Install dependencies - CHANGE THIS to `apk`
RUN apt-get update && apt-get install -y dumb-init 

Before Change

RUN apt-get update && apt-get install -y dumb-init

After Change

RUN apk update && apk add --no-cache dumb-init

v1.56.4

December 29, 2024

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

deepgram, fireworks ai, vision, admin ui, dependency upgrades

New Models

Deepgram Speech to Text

New Speech to Text support for Deepgram models. Start Here

from litellm import transcription
import os 

# set api keys 
os.environ["DEEPGRAM_API_KEY"] = ""
audio_file = open("/path/to/audio.mp3", "rb")

response = transcription(model="deepgram/nova-2", file=audio_file)

print(f"response: {response}")

Fireworks AI - Vision support for all models

LiteLLM supports document inlining for Fireworks AI models. This is useful for models that are not vision models, but still need to parse documents/images/etc. LiteLLM will add #transform=inline to the url of the image_url, if the model is not a vision model See Code

Proxy Admin UI

Test Key Tab displays model used in response

Test Key Tab renders content in .md, .py (any code/markdown format)

Dependency Upgrades

(Security fix) Upgrade to fastapi==0.115.5 https://github.com/BerriAI/litellm/pull/7447

Bug Fixes

Add health check support for realtime models Here
Health check error with audio_transcription model https://github.com/BerriAI/litellm/issues/5999

v1.56.3

December 28, 2024

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

guardrails, logging, virtual key management, new models

info

Get a 7 day free trial for LiteLLM Enterprise here.

no call needed

New Features

✨ Log Guardrail Traces

Track guardrail failure rate and if a guardrail is going rogue and failing requests. Start here

Traced Guardrail Success

Traced Guardrail Failure

`/guardrails/list`

/guardrails/list allows clients to view available guardrails + supported guardrail params

curl -X GET 'http://0.0.0.0:4000/guardrails/list'

Expected response

{
    "guardrails": [
        {
        "guardrail_name": "aporia-post-guard",
        "guardrail_info": {
            "params": [
            {
                "name": "toxicity_score",
                "type": "float",
                "description": "Score between 0-1 indicating content toxicity level"
            },
            {
                "name": "pii_detection",
                "type": "boolean"
            }
            ]
        }
        }
    ]
}

✨ Guardrails with Mock LLM

Send mock_response to test guardrails without making an LLM call. More info on mock_response here

curl -i http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-npnwjPQciVRok5yNZgKmFQ" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {"role": "user", "content": "hi my email is ishaan@berri.ai"}
    ],
    "mock_response": "This is a mock response",
    "guardrails": ["aporia-pre-guard", "aporia-post-guard"]
  }'

Assign Keys to Users

You can now assign keys to users via Proxy UI

New Models

openrouter/openai/o1
vertex_ai/mistral-large@2411

Fixes

Fix vertex_ai/ mistral model pricing: https://github.com/BerriAI/litellm/pull/7345
Missing model_group field in logs for aspeech call types https://github.com/BerriAI/litellm/pull/7392

v1.56.1

December 27, 2024

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

key management, budgets/rate limits, logging, guardrails

info

Get a 7 day free trial for LiteLLM Enterprise here.

no call needed

✨ Budget / Rate Limit Tiers

Define tiers with rate limits. Assign them to keys.

Use this to control access and budgets across a lot of keys.

Start here

curl -L -X POST 'http://0.0.0.0:4000/budget/new' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{
    "budget_id": "high-usage-tier",
    "model_max_budget": {
        "gpt-4o": {"rpm_limit": 1000000}
    }
}'

OTEL Bug Fix

LiteLLM was double logging litellm_request span. This is now fixed.

Relevant PR

Logging for Finetuning Endpoints

Logs for finetuning requests are now available on all logging providers (e.g. Datadog).

What's logged per request:

file_id
finetuning_job_id
any key/team metadata

Start Here:

Dynamic Params for Guardrails

You can now set custom parameters (like success threshold) for your guardrails in each request.

See guardrails spec for more details

v1.55.10

December 24, 2024

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

batches, guardrails, team management, custom auth

info

Get a free 7-day LiteLLM Enterprise trial here. Start here

No call needed

✨ Cost Tracking, Logging for Batches API (`/batches`)

Track cost, usage for Batch Creation Jobs. Start here

✨ `/guardrails/list` endpoint

Show available guardrails to users. Start here

✨ Allow teams to add models

This enables team admins to call their own finetuned models via litellm proxy. Start here

✨ Common checks for custom auth

Calling the internal common_checks function in custom auth is now enforced as an enterprise feature. This allows admins to use litellm's default budget/auth checks within their custom auth implementation. Start here

✨ Assigning team admins

Team admins is graduating from beta and moving to our enterprise tier. This allows proxy admins to allow others to manage keys/models for their own teams (useful for projects in production). Start here

v1.55.8-stable

December 22, 2024

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

A new LiteLLM Stable release just went out. Here are 5 updates since v1.52.2-stable.

langfuse, fallbacks, new models, azure_storage

Langfuse Prompt Management

This makes it easy to run experiments or change the specific models gpt-4o to gpt-4o-mini on Langfuse, instead of making changes in your applications. Start here

Control fallback prompts client-side

Claude prompts are different than OpenAI

Pass in prompts specific to model when doing fallbacks. Start here

New Providers / Models

NVIDIA Triton /infer endpoint. Start here
Infinity Rerank Models Start here

✨ Azure Data Lake Storage Support

Send LLM usage (spend, tokens) data to Azure Data Lake. This makes it easy to consume usage data on other services (eg. Databricks) Start here

Docker Run LiteLLM

docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:litellm_stable_release_branch-v1.55.8-stable

Get Daily Updates

LiteLLM ships new releases every day. Follow us on LinkedIn to get daily updates.

Deploy this version​

Major Changes​

Key Highlights​

New Models / Updated Models​

New Model Support​

Features​

LLM API Endpoints​

Features​

Management Endpoints / UI​

Features​

Logging / Guardrail / Prompt Management Integrations​

Features​

Guardrails​

Spend Tracking, Budgets and Rate Limiting​

MCP Gateway​

Performance / Loadbalancing / Reliability improvements​

Documentation Updates​

New Contributors​

PR Count Summary​

10/26/2025​

Full Changelog​

Deploy this version​

Key Highlights​

New Models / Updated Models​

New Model Support​

Features​

Bug Fixes​

LLM API Endpoints​

Features​

Bugs​

Management Endpoints / UI​

Features​

Logging / Guardrail / Prompt Management Integrations​

Guardrails​

Prompt Management​

Spend Tracking, Budgets and Rate Limiting​

Performance / Loadbalancing / Reliability improvements​

Documentation Updates​

New Contributors​

Full Changelog​

Deploy this version​

Key Highlights​

MCP Gateway - Control Tool Access by Team, Key​

Performance - 70% Lower p99 Latency​

Test Setup​

New Models / Updated Models​

New Model Support​

Features​

Bug Fixes​

LLM API Endpoints​

Features​

Bugs​

Management Endpoints / UI​

Features​

Logging / Guardrail / Prompt Management Integrations​

Features​

Guardrails​

Spend Tracking, Budgets and Rate Limiting​

MCP Gateway​

Performance / Loadbalancing / Reliability improvements​

General AI Gateway Improvements​

Security​

Documentation Updates​

New Contributors​

Deploy this version​

Key Highlights​

Performance - 2.9x Lower Median Latency​

Test Setup​

MCP OAuth 2.0 Support​

Scheduled Key Rotations​

New Models / Updated Models​

New Model Support​

Features​

Bug Fixes​

New Provider Support​

LLM API Endpoints​

Features​

Management Endpoints / UI​

Features​

Bug Fixes​

Deploy this version

Major Changes

Key Highlights

New Models / Updated Models

New Model Support

Features

LLM API Endpoints

Features

Management Endpoints / UI

Features

Logging / Guardrail / Prompt Management Integrations

Features

Guardrails

Spend Tracking, Budgets and Rate Limiting

MCP Gateway

Performance / Loadbalancing / Reliability improvements

Documentation Updates

New Contributors

PR Count Summary

10/26/2025

Full Changelog

Deploy this version

Key Highlights

New Models / Updated Models

New Model Support

Features

Bug Fixes

LLM API Endpoints

Features

Bugs

Management Endpoints / UI

Features

Logging / Guardrail / Prompt Management Integrations

Guardrails

Prompt Management

Spend Tracking, Budgets and Rate Limiting

Performance / Loadbalancing / Reliability improvements

Documentation Updates

New Contributors

Full Changelog

Deploy this version

Key Highlights

MCP Gateway - Control Tool Access by Team, Key

Performance - 70% Lower p99 Latency

Test Setup

New Models / Updated Models

New Model Support

Features

Bug Fixes

LLM API Endpoints

Features

Bugs

Management Endpoints / UI

Features

Logging / Guardrail / Prompt Management Integrations

Features

Guardrails

Spend Tracking, Budgets and Rate Limiting

MCP Gateway

Performance / Loadbalancing / Reliability improvements

General AI Gateway Improvements

Security

Documentation Updates

New Contributors

Deploy this version

Key Highlights

Performance - 2.9x Lower Median Latency

Test Setup

MCP OAuth 2.0 Support

Scheduled Key Rotations

New Models / Updated Models

New Model Support

Features

Bug Fixes

New Provider Support

LLM API Endpoints

Features

Management Endpoints / UI

Features

Bug Fixes