Text to Speech API: one endpoint, 100 voices, instant MP3

A clean REST API for AI text to speech. POST your text, get MP3 bytes back — 100 natural voices across 29 languages, billed from one flat monthly character allowance.

InstantVoiceAI gives you a text to speech API that is deliberately small: one POST endpoint, one bearer token, and raw MP3 bytes in the response. There is no SDK to install, no async job to poll, and no per-voice pricing maze to reason about. Send `{"text": "Hello world", "voice": "azure-aria"}` to `https://instantvoiceai.com/api/v1/tts` and you get back `audio/mpeg` you can stream, save, or pipe straight into your build. The full catalog is available — all 100 voices and 29 languages from the web app, powered by Microsoft Azure and Google neural models.

Pricing is the part developers usually care about most, so we keep it blunt. Every API call draws from the same flat monthly character allowance as your account, so you can forecast cost by counting characters instead of decoding credits or per-request tiers. API access is included only on the Pro ($49/mo, 2,000,000 characters) and Studio ($99/mo, 4,000,000 characters) plans — it is not available on the Free, Basic, Starter, or Creator tiers. For many production workloads that works out to far more characters per dollar than the ElevenLabs API, Google Cloud Text-to-Speech, or Amazon Polly. If you have outgrown a generic TTS service or want a simpler one to start with, this is the integration that takes an afternoon, not a sprint.

Get your API key See pricing

A text to speech API with exactly one endpoint

Most TTS REST APIs make you learn a request envelope, a voice taxonomy, and an audio-encoding matrix before you hear a single word. InstantVoiceAI collapses that into a single call. You POST JSON to one URL, authenticate with one header, and receive the finished MP3 in the response body — synchronously. There is no job queue, no webhook, and nothing to download separately.

Endpoint: POST https://instantvoiceai.com/api/v1/tts
Auth header: Authorization: Bearer ivai_your_key
Request body: {"text": "Hello world", "voice": "azure-aria", "speed": "normal"}
Response: raw MP3 bytes with Content-Type: audio/mpeg
Synchronous — the audio comes back in the same request, no polling

Make your first call in under two minutes

Create a key at /api-keys in the dashboard, drop it into the curl command below, and you have a working MP3. Keys start with ivai_ and are shown exactly once at creation time, then stored hashed — copy it into your secrets manager or environment when you create it. Here is a complete, copy-pasteable request:

curl -X POST https://instantvoiceai.com/api/v1/tts -H "Authorization: Bearer ivai_YOURKEY" -H "Content-Type: application/json" -d '{"text":"Hello world","voice":"azure-aria"}' --output hello.mp3

That writes a playable hello.mp3 to your working directory. Swap the voice id, change the text, and you are integrated. Note that API access requires a Pro or Studio plan.

1. Subscribe to a Pro or Studio plan (API access is included only on those tiers)
2. Open /api-keys and create a key — copy the ivai_ token immediately, it is shown once
3. Send the curl request above and check the resulting hello.mp3
4. Move the key into an environment variable or secrets manager — never commit it
5. Change the voice id and text, and wire the call into your app

Request and response, in full

The request body takes three fields. `text` is the string to speak. `voice` is any id from the catalog (for example azure-aria). `speed` accepts normal, slow, or fast. The response is the audio itself: a binary MP3 stream with Content-Type audio/mpeg, ready to write to a file, return to a browser, or push to object storage. Because the response is plain MP3 bytes, it slots into any HTTP client in any language — no custom parsing required.

text — the words to synthesize (required)
voice — a voice id from the 100-voice catalog, e.g. azure-aria
speed — normal, slow, or fast
Returns: audio/mpeg (raw MP3 bytes) on success
Works from any language: Python requests, Node fetch, Go net/http, Ruby, PHP, Bash

All 100 voices and 29 languages, same catalog as the app

The API is not a stripped-down subset. Every voice and language available in the InstantVoiceAI web app is reachable through the same /api/v1/tts endpoint, powered by Microsoft Azure and Google neural models. Browse the full set on the voices page, copy the id you want, and pass it as the voice field. That means you can localize an app across 29 languages without touching multiple providers or stitching together different voice vendors.

100 natural AI voices, the same ones in the web app
29 languages from a single endpoint
Azure + Google neural models under the hood
Pick a voice on /voices and pass its id directly to the API
Localize across languages without adding a second TTS provider

Flat character pricing you can actually forecast

API calls draw from your account's monthly character allowance — the same pool the web app uses — so cost is a function of characters, not opaque credits or per-request surcharges. Pro gives you 2,000,000 characters a month (plus 200,000 premium HD-voice characters) and Studio gives you 4,000,000. There is no separate API meter to reconcile, no premium markup for using the endpoint instead of the UI, and a one-time 100,000-character top-up ($8, never expires) is there if a busy month runs long. For high-volume voiceover pipelines, that flat model is typically far cheaper per character than usage-billed competitors.

Pro $49/mo — 2,000,000 characters/month (+200,000 premium HD-voice chars), API access included
Studio $99/mo — 4,000,000 characters/month, API access included
API and web app share one character allowance — no double-billing
No per-request fee and no premium markup for API calls
One-time top-up: 100,000 characters for $8, never expires

Pronunciation rules carry over to the API

Anything you have configured in your pronunciation dictionary applies to API output too. If you have taught the app to say a brand name, an acronym, or a tricky proper noun a specific way, the same custom replacements are applied to the text before synthesis on every API call. You define the rules once in the dashboard and they govern both the UI and the endpoint — so generated audio stays consistent no matter which path created it.

Custom word replacements are applied before speech on API requests
Same dictionary governs the web app and the API — define it once
Keep brand names, acronyms, and proper nouns pronounced consistently
No need to pre-process text in your own code to fix pronunciation

What developers build with it

A synchronous MP3 endpoint with flat character pricing fits a wide range of automated audio work. Because there is no job queue or callback to manage, it is especially well suited to scripts, build steps, and request-time generation where you just want bytes back.

Add natural TTS to web and mobile apps without a heavy SDK
Automate voiceover pipelines for video, podcasts, and social clips
Generate audio at scale from a CMS or static-site build step
Power e-learning platforms with narrated lessons in 29 languages
Produce IVR and phone-system prompts, and ship accessible audio versions of written content

A simpler, cheaper alternative to Polly, Google TTS, and the ElevenLabs API

InstantVoiceAI is not trying to be the most configurable TTS platform in existence — it is trying to be the one you can integrate before lunch and budget without a spreadsheet. Compared with the ElevenLabs API, Amazon Polly, and Google Cloud Text-to-Speech, you trade a sprawling options surface for one endpoint, one auth header, and a flat monthly character allowance instead of per-request or credit-based metering. For teams that mainly need good neural voices, many languages, and predictable cost, that simplicity is the feature.

One POST endpoint vs multi-step request setup and async jobs
Flat monthly character allowance vs per-request or credit metering
100 voices across 29 languages from a single catalog
Raw MP3 in the response — no extra fetch or decode step
Pair it with the web app's voice cloning, dubbing, and AI voice design

Capability	InstantVoiceAI API	ElevenLabs API	Amazon Polly / Google Cloud TTS
Endpoint to integrate	One POST endpoint, MP3 back	Multiple endpoints, voice + settings setup	Service SDK, request envelope to learn
Pricing model	Flat monthly character allowance	Subscription + credit/usage tiers	Pay-per-character usage billing
Auth	Single Bearer token (ivai_)	API key / token	Cloud IAM credentials or access keys
Response format	Raw MP3 bytes (audio/mpeg)	Audio stream / file	Audio stream / file
Voices	100 natural AI voices	Large voice library	Dozens of neural voices
Languages	29 languages	Many languages	Many languages
Plans with API access	Pro $49 & Studio $99 only	Paid plans	Cloud account, pay as you go
Best for	Simple, predictable, high-volume TTS	Expressive voice variety	Deep AWS/GCP integration

Frequently asked questions

How do I get a text to speech API key?

Subscribe to the Pro ($49/mo) or Studio ($99/mo) plan, then open /api-keys in your dashboard and create a key. Each key starts with ivai_ and is shown only once at creation, then stored hashed — copy it into your environment or secrets manager right away. Pass it on every request as Authorization: Bearer ivai_yourkey.

Which plans include API access?

API access is included only on the Pro and Studio plans. It is not available on the Free, Basic, Starter, or Creator tiers. If you are on a lower plan and want the API, upgrade to Pro (2,000,000 characters/month) or Studio (4,000,000 characters/month) on the pricing page.

What does the TTS API return?

It returns raw MP3 bytes with Content-Type: audio/mpeg — the finished audio, synchronously, in the same HTTP response. There is no job to poll and no second request to download the file. Write the bytes to disk, stream them to a browser, or push them to object storage.

How is API usage billed?

Every API call draws from the same flat monthly character allowance as your account — there is no separate API meter and no premium markup for using the endpoint. Cost is a function of characters synthesized, so you can forecast it by counting characters. If a busy month runs long, a one-time 100,000-character top-up costs $8 and never expires.

Can I use all 100 voices and 29 languages through the API?

Yes. The API exposes the full catalog — the same 100 natural AI voices and 29 languages as the web app, powered by Azure and Google neural models. Browse them on the voices page, copy the voice id you want (for example azure-aria), and pass it as the voice field in your request body.

How does this compare to the ElevenLabs API, Amazon Polly, or Google Cloud TTS?

It is built to be simpler and more predictable: one POST endpoint instead of multi-step setup, one bearer token instead of cloud IAM, raw MP3 in the response instead of an extra fetch, and a flat monthly character allowance instead of per-request or credit-based metering. For high-volume TTS where you mainly need good neural voices and many languages, that often means far more characters per dollar.

Explore more

See plans with API access Get an API key Browse the 100 voices ElevenLabs alternative

Start free — 100 voices, 29 languages

No credit card required. Paid plans from $4/month.

Get your API key