InstantVoiceAI

Text to Speech API: one endpoint, 100 voices, instant MP3

A clean REST API for AI text to speech. POST your text, get MP3 bytes back — 100 natural voices across 29 languages, billed from one flat monthly character allowance.

InstantVoiceAI gives you a text to speech API that is deliberately small: one POST endpoint, one bearer token, and raw MP3 bytes in the response. There is no SDK to install, no async job to poll, and no per-voice pricing maze to reason about. Send `{"text": "Hello world", "voice": "azure-aria"}` to `https://instantvoiceai.com/api/v1/tts` and you get back `audio/mpeg` you can stream, save, or pipe straight into your build. The full catalog is available — all 100 voices and 29 languages from the web app, powered by Microsoft Azure and Google neural models.

Pricing is the part developers usually care about most, so we keep it blunt. Every API call draws from the same flat monthly character allowance as your account, so you can forecast cost by counting characters instead of decoding credits or per-request tiers. API access is included only on the Pro ($49/mo, 2,000,000 characters) and Studio ($99/mo, 4,000,000 characters) plans — it is not available on the Free, Basic, Starter, or Creator tiers. For many production workloads that works out to far more characters per dollar than the ElevenLabs API, Google Cloud Text-to-Speech, or Amazon Polly. If you have outgrown a generic TTS service or want a simpler one to start with, this is the integration that takes an afternoon, not a sprint.

A text to speech API with exactly one endpoint

Most TTS REST APIs make you learn a request envelope, a voice taxonomy, and an audio-encoding matrix before you hear a single word. InstantVoiceAI collapses that into a single call. You POST JSON to one URL, authenticate with one header, and receive the finished MP3 in the response body — synchronously. There is no job queue, no webhook, and nothing to download separately.

  • Endpoint: POST https://instantvoiceai.com/api/v1/tts
  • Auth header: Authorization: Bearer ivai_your_key
  • Request body: {"text": "Hello world", "voice": "azure-aria", "speed": "normal"}
  • Response: raw MP3 bytes with Content-Type: audio/mpeg
  • Synchronous — the audio comes back in the same request, no polling

Make your first call in under two minutes

Create a key at /api-keys in the dashboard, drop it into the curl command below, and you have a working MP3. Keys start with ivai_ and are shown exactly once at creation time, then stored hashed — copy it into your secrets manager or environment when you create it. Here is a complete, copy-pasteable request:

curl -X POST https://instantvoiceai.com/api/v1/tts -H "Authorization: Bearer ivai_YOURKEY" -H "Content-Type: application/json" -d '{"text":"Hello world","voice":"azure-aria"}' --output hello.mp3

That writes a playable hello.mp3 to your working directory. Swap the voice id, change the text, and you are integrated. Note that API access requires a Pro or Studio plan.

  • 1. Subscribe to a Pro or Studio plan (API access is included only on those tiers)
  • 2. Open /api-keys and create a key — copy the ivai_ token immediately, it is shown once
  • 3. Send the curl request above and check the resulting hello.mp3
  • 4. Move the key into an environment variable or secrets manager — never commit it
  • 5. Change the voice id and text, and wire the call into your app

Request and response, in full

The request body takes three fields. `text` is the string to speak. `voice` is any id from the catalog (for example azure-aria). `speed` accepts normal, slow, or fast. The response is the audio itself: a binary MP3 stream with Content-Type audio/mpeg, ready to write to a file, return to a browser, or push to object storage. Because the response is plain MP3 bytes, it slots into any HTTP client in any language — no custom parsing required.

  • text — the words to synthesize (required)
  • voice — a voice id from the 100-voice catalog, e.g. azure-aria
  • speed — normal, slow, or fast
  • Returns: audio/mpeg (raw MP3 bytes) on success
  • Works from any language: Python requests, Node fetch, Go net/http, Ruby, PHP, Bash

All 100 voices and 29 languages, same catalog as the app

The API is not a stripped-down subset. Every voice and language available in the InstantVoiceAI web app is reachable through the same /api/v1/tts endpoint, powered by Microsoft Azure and Google neural models. Browse the full set on the voices page, copy the id you want, and pass it as the voice field. That means you can localize an app across 29 languages without touching multiple providers or stitching together different voice vendors.

  • 100 natural AI voices, the same ones in the web app
  • 29 languages from a single endpoint
  • Azure + Google neural models under the hood
  • Pick a voice on /voices and pass its id directly to the API
  • Localize across languages without adding a second TTS provider

Flat character pricing you can actually forecast

API calls draw from your account's monthly character allowance — the same pool the web app uses — so cost is a function of characters, not opaque credits or per-request surcharges. Pro gives you 2,000,000 characters a month (plus 200,000 premium HD-voice characters) and Studio gives you 4,000,000. There is no separate API meter to reconcile, no premium markup for using the endpoint instead of the UI, and a one-time 100,000-character top-up ($8, never expires) is there if a busy month runs long. For high-volume voiceover pipelines, that flat model is typically far cheaper per character than usage-billed competitors.

  • Pro $49/mo — 2,000,000 characters/month (+200,000 premium HD-voice chars), API access included
  • Studio $99/mo — 4,000,000 characters/month, API access included
  • API and web app share one character allowance — no double-billing
  • No per-request fee and no premium markup for API calls
  • One-time top-up: 100,000 characters for $8, never expires

Pronunciation rules carry over to the API

Anything you have configured in your pronunciation dictionary applies to API output too. If you have taught the app to say a brand name, an acronym, or a tricky proper noun a specific way, the same custom replacements are applied to the text before synthesis on every API call. You define the rules once in the dashboard and they govern both the UI and the endpoint — so generated audio stays consistent no matter which path created it.

  • Custom word replacements are applied before speech on API requests
  • Same dictionary governs the web app and the API — define it once
  • Keep brand names, acronyms, and proper nouns pronounced consistently
  • No need to pre-process text in your own code to fix pronunciation

What developers build with it

A synchronous MP3 endpoint with flat character pricing fits a wide range of automated audio work. Because there is no job queue or callback to manage, it is especially well suited to scripts, build steps, and request-time generation where you just want bytes back.

  • Add natural TTS to web and mobile apps without a heavy SDK
  • Automate voiceover pipelines for video, podcasts, and social clips
  • Generate audio at scale from a CMS or static-site build step
  • Power e-learning platforms with narrated lessons in 29 languages
  • Produce IVR and phone-system prompts, and ship accessible audio versions of written content

A simpler, cheaper alternative to Polly, Google TTS, and the ElevenLabs API

InstantVoiceAI is not trying to be the most configurable TTS platform in existence — it is trying to be the one you can integrate before lunch and budget without a spreadsheet. Compared with the ElevenLabs API, Amazon Polly, and Google Cloud Text-to-Speech, you trade a sprawling options surface for one endpoint, one auth header, and a flat monthly character allowance instead of per-request or credit-based metering. For teams that mainly need good neural voices, many languages, and predictable cost, that simplicity is the feature.

  • One POST endpoint vs multi-step request setup and async jobs
  • Flat monthly character allowance vs per-request or credit metering
  • 100 voices across 29 languages from a single catalog
  • Raw MP3 in the response — no extra fetch or decode step
  • Pair it with the web app's voice cloning, dubbing, and AI voice design
CapabilityInstantVoiceAI APIElevenLabs APIAmazon Polly / Google Cloud TTS
Endpoint to integrateOne POST endpoint, MP3 backMultiple endpoints, voice + settings setupService SDK, request envelope to learn
Pricing modelFlat monthly character allowanceSubscription + credit/usage tiersPay-per-character usage billing
AuthSingle Bearer token (ivai_)API key / tokenCloud IAM credentials or access keys
Response formatRaw MP3 bytes (audio/mpeg)Audio stream / fileAudio stream / file
Voices100 natural AI voicesLarge voice libraryDozens of neural voices
Languages29 languagesMany languagesMany languages
Plans with API accessPro $49 & Studio $99 onlyPaid plansCloud account, pay as you go
Best forSimple, predictable, high-volume TTSExpressive voice varietyDeep AWS/GCP integration

Frequently asked questions

How do I get a text to speech API key?

Subscribe to the Pro ($49/mo) or Studio ($99/mo) plan, then open /api-keys in your dashboard and create a key. Each key starts with ivai_ and is shown only once at creation, then stored hashed — copy it into your environment or secrets manager right away. Pass it on every request as Authorization: Bearer ivai_yourkey.

Which plans include API access?

API access is included only on the Pro and Studio plans. It is not available on the Free, Basic, Starter, or Creator tiers. If you are on a lower plan and want the API, upgrade to Pro (2,000,000 characters/month) or Studio (4,000,000 characters/month) on the pricing page.

What does the TTS API return?

It returns raw MP3 bytes with Content-Type: audio/mpeg — the finished audio, synchronously, in the same HTTP response. There is no job to poll and no second request to download the file. Write the bytes to disk, stream them to a browser, or push them to object storage.

How is API usage billed?

Every API call draws from the same flat monthly character allowance as your account — there is no separate API meter and no premium markup for using the endpoint. Cost is a function of characters synthesized, so you can forecast it by counting characters. If a busy month runs long, a one-time 100,000-character top-up costs $8 and never expires.

Can I use all 100 voices and 29 languages through the API?

Yes. The API exposes the full catalog — the same 100 natural AI voices and 29 languages as the web app, powered by Azure and Google neural models. Browse them on the voices page, copy the voice id you want (for example azure-aria), and pass it as the voice field in your request body.

How does this compare to the ElevenLabs API, Amazon Polly, or Google Cloud TTS?

It is built to be simpler and more predictable: one POST endpoint instead of multi-step setup, one bearer token instead of cloud IAM, raw MP3 in the response instead of an extra fetch, and a flat monthly character allowance instead of per-request or credit-based metering. For high-volume TTS where you mainly need good neural voices and many languages, that often means far more characters per dollar.

Explore more

Start free — 100 voices, 29 languages

No credit card required. Paid plans from $4/month.

Get your API key