Audio Endpoints Documentation
The codebase contains two audio synthesis endpoints:/api/audio/speech- Main TTS endpoint supporting REST and streaming modes/api/audio/speech/sse- Dedicated SSE streaming endpoint
Overview
The Audio Synthesis API provides text-to-speech (TTS) capabilities using Google Cloud’s Chirp voice cloning technology with Gemini voices. The API supports both REST and Server-Sent Events (SSE) streaming modes. Base URL:${NEXT_PUBLIC_BASE_URL}
Version: 1.0.0
Authentication
All audio endpoints require API key authentication.Headers
Getting an API Key
- Authenticate with OAuth at
/oauth/get-key - Create an API key at
/api/auth/api-keys - Save the key securely (shown only once)
Authentication Errors
| Status | Error | Description |
|---|---|---|
| 401 | invalid_api_key | API key missing or malformed |
| 401 | invalid_api_key | API key not found or revoked |
| 403 | insufficient_scope | API key lacks required scopes |
| 429 | rate_limit_exceeded | Too many requests |
Endpoints
1. REST Speech Synthesis
POST/api/audio/speech
Synthesize speech from text using a specified voice. Returns either a public URL or audio buffer.
Request Body
REST Method Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
voice_id | string | ✓ | - | Voice identifier from database |
text | string | ✓ | - | Text to synthesize (max ~5000 chars) |
sample_rate | number | ✗ | 44100 | Sample rate in Hz (24000, 44100, 48000) |
language_code | string | ✗ | ‘en-US’ | BCP-47 language code |
method | string | ✗ | ‘rest’ | Must be ‘rest’ for this mode |
return_format | string | ✗ | ‘url' | 'url’ for public link, ‘buffer’ for binary |
output_format | string | ✗ | ‘wav’ | Audio format: ‘wav’, ‘pcm’, ‘mp3’ |
Response (URL Mode)
Response (Buffer Mode)
Content-Type:audio/wav | audio/pcm | audio/mpeg
Headers:
Content-Disposition:attachment; filename={id}.{format}X-Voice: Voice nameX-Method: ‘rest’X-Chars: Character countX-Format: Output format