Blog

Building Voice-Enabled Content Experiences: How to Add AI Speech Features to Your Headless CMS

Tutorials

Business

Tony Spiro

February 23, 2026

Building Voice-Enabled Content Experiences: How to Add AI Speech Features to Your Headless CMS - cover image

0:00

-9:51

Listen to this article · 9:51

Audio content is no longer optional. Major publishers like The Guardian have added "Listen to this article" features, and accessibility requirements are driving demand for multi-modal content delivery. With Cosmic's built-in AI audio generation, developers can now add voice capabilities to any Cosmic-powered application in minutes, not weeks.

This guide walks you through using Cosmic's native text-to-speech capabilities, covering the dashboard workflow, API integration, content modeling, implementation patterns, and cost optimization.

Why Voice-Enabled Content Matters

Voice-enabled content serves multiple audiences:

Accessibility compliance: WCAG 2.1 guidelines (1.2.1, 1.2.3) establish requirements for audio alternatives to text content
Multi-modal consumption: Users increasingly want to listen while commuting, exercising, or multitasking
Content reach expansion: Audio versions open your content to visually impaired users and auditory learners
Engagement metrics: Publications report increased time-on-site when audio options are available

Cosmic AI Audio Generation

Cosmic provides built-in text-to-speech powered by OpenAI's TTS models, available directly from the dashboard and API. No third-party API keys or external services required.

Key Features

9 natural-sounding voices: Choose from feminine voices (nova, shimmer, coral, sage, alloy) and masculine voices (echo, onyx, fable, ash)
Two quality tiers: Standard (tts-1) for fast, low-latency generation or HD (tts-1-hd) for higher quality output
Long text support: Texts over 4,096 characters are automatically chunked at paragraph boundaries and concatenated into a single file
Instant CDN delivery: Generated audio is saved as MP3 to your Media Library and served through Cosmic's global CDN

Voice Selection Guide

Voice	Style	Best For
nova	Warm and bright, friendly narration	Podcasts, explainers
shimmer	Soft and intimate, gentle delivery	Meditation, ASMR, bedtime stories
coral	Clear and polished, professional tone	Product demos, business content
sage	Calm and steady, thoughtful pace	Education, tutorials
alloy	Neutral and balanced, versatile	General purpose, articles
echo	Deep and authoritative, confident	Announcements, trailers, news
onyx	Bold and commanding, strong presence	Intros, branding, dramatic reads
fable	Animated and expressive, natural storyteller	Storytelling, audiobooks
ash	Warm and approachable, conversational	Conversational, interviews

Generating Audio from the Dashboard

The fastest way to get started is directly from the Cosmic dashboard:

Navigate to Media in your project
Click Create and select Audio
Select a voice from the dropdown (default: Nova)
Paste or type the text you want to convert to speech
Click Generate to create the audio file

Audio files are automatically saved to your Media Library in MP3 format, ready for use in your applications. For full details on dashboard usage, see the AI dashboard docs.

Content Modeling for Audio

Before implementing programmatic TTS, extend your content model to support audio metadata. In Cosmic, add these metafields to your blog post or article object type:

This schema captures essential audio metadata for playback controls and accessibility features. The metafield uses Cosmic's file type, which stores a reference to a media object in your Media Library, giving you access to the full media properties including the CDN URL, file name, and size.

Implementation: Cosmic SDK Audio Generation

Here's a complete implementation pattern for generating audio from Cosmic content using the built-in AI audio API:

This approach uses Cosmic AI's text generation with a lightweight model (Claude Haiku 4.5) to intelligently convert markdown content into narration-ready text before passing it to the audio generator. Unlike simple HTML stripping, the AI understands context and converts structured elements like tables and code blocks into natural, speakable prose. There is no need for a separate OpenAI client or API key. Cosmic handles the text cleanup, TTS generation, file storage, and CDN delivery all within the platform. The generated MP3 is automatically available in your Media Library. When updating the article, we set the metafield to the media object's value, which links the file metafield to the corresponding media asset in your library.

Using the API Directly with cURL

You can also generate audio directly via the REST API:

The response includes the full media object with a CDN URL you can use immediately in your application.

Batch Audio Generation

For generating audio across multiple articles at once, use a batch processing approach:

Automating Audio with Cosmic AI Agents

For a fully hands-off approach, use Cosmic AI Agents to automatically generate audio whenever new content is published. Set up an event-triggered Content Agent that listens for publish events:

Navigate to the AI Agents page in your project
Click Create Agent and select Content Agent
Set the prompt: "When a blog post is published, generate an audio version using the nova voice and update the audio_file metafield with the result."
Enable Event Triggers and select Object Published for your blog posts object type

You can also create a Workflow that chains audio generation with other tasks, such as generating social media content or translating the article into other languages, all triggered by a single publish event.

Deployment and Caching Strategies

Audio generation consumes AI tokens, so implement these optimizations:

1. Pre-generate on publish: Use Cosmic webhooks or event-triggered agents to generate audio when content is published, not on user request.

2. CDN caching: Generated audio files are stored in Cosmic's Media Library and delivered through the global CDN for edge caching. No additional CDN configuration is needed.

3. Selective generation: Not all content needs audio. Add a toggle metafield to let editors choose which articles get audio versions.

4. Choose the right quality tier: Use standard (tts-1) for most content. Reserve HD (tts-1-hd) for premium content where higher audio fidelity justifies the 2x token cost.

5. Automatic chunking: Cosmic handles long texts automatically. Articles over 4,096 characters are split at paragraph boundaries and concatenated into a single MP3, so you do not need to manage chunking logic yourself.

Accessibility Compliance

When implementing voice features, ensure WCAG 2.1 compliance:

Provide transcripts: Audio content must have text alternatives (WCAG 1.2.1). Since your source content is already text, link to it alongside the audio player.
Player controls: Include play, pause, volume, and playback speed controls
Keyboard navigation: Ensure the audio player is fully keyboard accessible
Visual indicators: Show current playback position and remaining time

Understanding Token Costs

Cosmic AI audio generation uses tokens as the unit of usage. In general, 1 token equals roughly 1 word or 4 characters. Audio generation tokens are counted as output tokens, which reflect the computational requirements of producing speech.

You can monitor your AI token usage from the Usage section of your project settings. For details on plan limits and token add-ons, visit the pricing page.

To optimize costs:

Use the standard model (tts-1) for routine content
Reserve HD (tts-1-hd) for high-visibility or premium content
Skip audio generation for short-form content like product updates or changelogs
Track usage patterns monthly and adjust your generation strategy accordingly

Putting It All Together

Voice-enabled content transforms how users interact with your applications. With Cosmic's built-in AI audio generation, you do not need to manage separate TTS API keys, handle file storage, or configure CDN delivery. Everything is handled within the platform.

Start by generating a few audio files from the dashboard to test voice options. Then integrate the SDK into your publish workflow for automated generation. For fully autonomous audio production, set up an event-triggered agent that generates audio every time new content goes live.

Your content becomes instantly more accessible, engaging, and versatile with just a few lines of code. Log in to your Cosmic account and start generating audio today.

Resources

Introducing AI Audio Generation - Announcement blog post
AI API Reference - Full API documentation for audio generation
AI Dashboard Docs - Dashboard usage guide
API Reference - Complete Cosmic API documentation

Continue Learning

Documentation

Articles

Comparisons

Ready to get started?

Build your next project with Cosmic and start creating content faster.

Try Cosmic Free

Talk to Sales

No credit card required • Free forever

Back to blog