AI

Learn about AI capabilities in the API; generate text, images, and videos using Cosmic AI.

Cosmic provides AI-powered text, image, and video generation capabilities through the API and SDK.

Base URL

Use the following endpoint to create AI-generated text and images.

https://workers.cosmicjs.com

POST/v3/buckets/:bucket_slug/ai/text

Generate Text

This endpoint enables you to generate text content using AI models.

The data request payload will need to be sent in JSON format with the Content-Type: application/json header set. A bucket write key is required for all AI operations.

Required parameters

You must provide either a prompt or messages parameter.

Name
prompt
Type
string
Description
A text prompt for the AI to generate content from. Use this for simple, single-prompt text generation.
Name
messages
Type
array
Description
An array of message objects for chat-based models. Each message object should have a role (either "user" or "assistant") and content (the message text).

Optional parameters

Name
model
Type
string
Description
The AI model to use for text generation. Options include Claude models (e.g., claude-opus-4-6), Gemini models (e.g., gemini-3.1-pro-preview), and OpenAI models (e.g., gpt-5.2-codex). Defaults to claude-opus-4-6. See Available Models section for full list.
Name
max_tokens
Type
number
Description
The maximum number of tokens to generate in the response. Higher values allow for longer responses but may increase processing time.
Name
media_url
Type
string
Description
URL of a file to analyze. Can be any file type available in your Bucket including images, PDFs, Excel spreadsheets, Word documents, and more. The AI model will be able to analyze the content when generating text. Can be used with either prompt or messages.
Name
stream
Type
boolean
Description
When set to true, enables streaming for real-time responses as they're generated, rather than waiting for the complete response. Default is false.

Request Examples

import { createBucketClient } from '@cosmicjs/sdk'

const cosmic = createBucketClient({
  bucketSlug: 'BUCKET_SLUG',
  readKey: 'BUCKET_READ_KEY',
  writeKey: 'BUCKET_WRITE_KEY'
})

// Using a simple prompt
const textResponse = await cosmic.ai.generateText({
  prompt: 'Write a product description for a coffee mug',
  max_tokens: 500
})

console.log(textResponse.text)
console.log(textResponse.usage)

Media Analysis Examples

import { createBucketClient } from '@cosmicjs/sdk'

const cosmic = createBucketClient({
  bucketSlug: 'BUCKET_SLUG',
  readKey: 'BUCKET_READ_KEY',
  writeKey: 'BUCKET_WRITE_KEY'
})

const imageAnalysis = await cosmic.ai.generateText({
  prompt: 'Describe this image in detail and suggest a caption for social media',
  media_url: 'https://cdn.cosmicjs.com/mountain-landscape.jpg',
  max_tokens: 500
})

console.log(imageAnalysis.text)
console.log(imageAnalysis.usage)

cURL Examples

curl https://workers.cosmicjs.com/v3/buckets/${BUCKET_SLUG}/ai/text \
  -d '{"prompt":"Write a product description for a coffee mug","max_tokens":500}' \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer {BUCKET_WRITE_KEY}"

Response Examples

{
  "text": "Introducing our Artisan Ceramic Coffee Mug – the perfect companion for your daily brew. Crafted with care from high-quality ceramic, this elegant mug retains heat longer, ensuring your coffee stays at the ideal temperature. The ergonomic handle provides a comfortable grip, while the smooth, glazed interior prevents staining and makes cleaning effortless. Available in a range of sophisticated colors to match any kitchen aesthetic, this 12oz mug strikes the perfect balance between style and functionality.",
  "usage": {
    "input_tokens": 10,
    "output_tokens": 89
  }
}

Streaming Capabilities

Text generation supports real-time streaming responses, allowing you to receive and display content as it's being generated.

Using with the SDK

import { TextStreamingResponse } from '@cosmicjs/sdk';

// Enable streaming with the stream: true parameter
const result = await cosmic.ai.generateText({
  prompt: 'Tell me about coffee mugs',
  // or use messages array format
  max_tokens: 500,
  stream: true // Enable streaming
});

// Cast the result to TextStreamingResponse
const stream = result as TextStreamingResponse;

// Option 1: Event-based approach
let fullResponse = '';
stream.on('text', (text) => {
  fullResponse += text;
  process.stdout.write(text); // Display text as it arrives
});
stream.on('usage', (usage) => console.log('Usage:', usage));
stream.on('end', (data) => console.log('Complete:', fullResponse));
stream.on('error', (error) => console.error('Error:', error));

// Option 2: For-await loop approach
async function processStream() {
  let fullResponse = '';
  try {
    for await (const chunk of stream) {
      if (chunk.text) {
        fullResponse += chunk.text;
        process.stdout.write(chunk.text);
      }
    }
    console.log('\nComplete text:', fullResponse);
  } catch (error) {
    console.error('Error:', error);
  }
}

Using the simplified stream method

// Simplified streaming method
const stream = await cosmic.ai.stream({
  prompt: 'Tell me about coffee mugs',
  max_tokens: 500,
});

// Process stream using events or for-await loop as shown above

Using cURL for streaming

curl https://workers.cosmicjs.com/v3/buckets/${BUCKET_SLUG}/ai/text \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer ${BUCKET_WRITE_KEY}" \
  -d '{"prompt":"Tell me about coffee mugs","stream":true}' \
  --no-buffer

The TextStreamingResponse supports two usage patterns:

Event-based: Extends EventEmitter with these events:
- text: New text fragments
- usage: Token usage information
- end: Final data when stream completes
- error: Stream errors
AsyncIterator: For for-await loops, with chunk objects containing:
- text: Text fragments
- usage: Token usage information
- end: Set to true for the final chunk
- error: Error information

POST/v3/buckets/:bucket_slug/ai/image

Generate Video

This endpoint enables you to create AI-generated videos using Google's Veo 3.1 models with native audio generation.

The data request payload will need to be sent in JSON format with the Content-Type: application/json header set. Video generation is an asynchronous operation that typically takes 30-90 seconds.

Content Policy: Video generation will fail if the prompt contains references to real people (celebrities, public figures), copyrighted characters, or other policy-violating content. If you receive a "No videos generated" error, try rephrasing your prompt to avoid these elements.

Required parameters

Name
prompt
Type
string
Description
A detailed text description of the video you want to generate. More descriptive prompts yield better results.

Optional parameters

Name
model
Type
string
Description
The video generation model to use. Options: veo-3.1-fast-generate-preview (recommended, faster) or veo-3.1-generate-preview (premium quality). Defaults to veo-3.1-fast-generate-preview.
Name
duration
Type
number
Description
Video duration in seconds. Options: 4, 6, or 8. Defaults to 8.
Name
resolution
Type
string
Description
Video resolution. Options: 720p or 1080p. Defaults to 720p.
Name
reference_images
Type
array
Description
Array with 1 reference image URL to use as the first frame for video generation. Veo uses image-to-video mode to animate from this starting frame, ensuring precise control over the initial appearance, composition, and style. Ideal for product showcases, character consistency, and brand-accurate animations. Maximum 1 image.
Name
folder
Type
string
Description
Media folder to store the generated video in. (Folder must exist)
Name
metadata
Type
object
Description
User-added JSON metadata for the generated video.

Request Examples

import { createBucketClient } from '@cosmicjs/sdk'

const cosmic = createBucketClient({
  bucketSlug: 'BUCKET_SLUG',
  readKey: 'BUCKET_READ_KEY',
  writeKey: 'BUCKET_WRITE_KEY'
})

const videoResponse = await cosmic.ai.generateVideo({
  prompt: 'A calico kitten playing with a ball of yarn in golden sunlight',
  duration: 8,
  resolution: '720p',
  folder: 'ai-videos'
})

console.log(videoResponse.media.url)
console.log(videoResponse.usage)

Response Example

{
  "media": {
    "id": "65f8a3b2c4d5e6f7g8h9i0j1",
    "name": "veo-a1b2c3d4-e5f6-g7h8-i9j0-k1l2m3n4o5p6.mp4",
    "original_name": "veo-a1b2c3d4-e5f6-g7h8-i9j0-k1l2m3n4o5p6.mp4",
    "size": 8450000,
    "type": "video/mp4",
    "bucket": "65a1b2c3d4e5f6g7h8i9j0k1",
    "created_at": "2025-12-20T10:30:00.000Z",
    "folder": "ai-videos",
    "url": "https://cdn.cosmicjs.com/veo-xxx.mp4",
    "imgix_url": "https://imgix.cosmicjs.com/veo-xxx.mp4",
    "alt_text": "A calico kitten playing with a ball of yarn in golden sunlight",
    "metadata": {
      "duration": 8,
      "resolution": "720p",
      "generation_time_seconds": 45
    }
  },
  "usage": {
    "input_tokens": 288000,
    "output_tokens": 0,
    "total_tokens": 288000
  },
  "generation_time_seconds": 45
}

Available Models

Cosmic supports a variety of AI models from Anthropic, Google Gemini, and OpenAI for text, image, video, and audio generation.

Text Generation Models

Anthropic Claude Models

Model	API Model ID	Description
Claude Opus 4.6	`claude-opus-4-6`	Most intelligent model for agents and coding (recommended)
Claude Opus 4.5	`claude-opus-4-5-20251101`	Previous flagship model with improved performance
Claude Opus 4.1	`claude-opus-4-1-20250805`	Most powerful and capable model
Claude Opus 4	`claude-opus-4-20250514`	Previous flagship model
Claude Sonnet 4.5	`claude-sonnet-4-5-20250929`	High-performance model
Claude Sonnet 4	`claude-sonnet-4-20250514`	High-performance model with exceptional reasoning
Claude Haiku 4.5	`claude-haiku-4-5-20251001`	Fastest model with near-frontier intelligence

Google Gemini Models

Model	API Model ID	Description
Gemini 3.1 Pro	`gemini-3.1-pro-preview`	Google's most advanced model with improved reasoning, token efficiency, and agentic performance (recommended)
Gemini 3 Pro 🍌	`gemini-3-pro-preview`	Google's previous flagship model with state-of-the-art reasoning
Gemini 3.1 Flash Image 🍌	`gemini-3.1-flash-image-preview`	Fast, efficient image generation with 512px to 4K support, 14 aspect ratios, and up to 14 reference images (recommended)
Gemini 3 Pro Image 🍌	`gemini-3-pro-image-preview`	Premium image generation for professional assets with up to 4K images
Veo 3.1 Fast 📹	`veo-3.1-fast-generate-preview`	Fast video generation with native audio (recommended for most use cases)
Veo 3.1 Standard 📹	`veo-3.1-generate-preview`	Premium video generation with exceptional quality and cinematic realism

OpenAI GPT Models

Model	API Model ID	Description
GPT-5.2 Codex	`gpt-5.2-codex`	Most intelligent coding model for long-horizon, agentic coding tasks (recommended)
GPT-5.2	`gpt-5.2`	Latest GPT-5 model with improved performance
GPT-5	`gpt-5`	OpenAI's most advanced multimodal model
GPT-5 Mini	`gpt-5-mini`	Cost-effective GPT-5 variant
GPT-5 Nano	`gpt-5-nano`	Ultra-fast and cost-effective for simple tasks
GPT-4.1	`gpt-4.1`	Enhanced coding and reasoning capabilities
GPT-4o	`gpt-4o`	Flagship multimodal model
o1	`o1`	Advanced reasoning model for complex problems
o3	`o3`	Latest reasoning model with improved performance

Text Generation Model Selection

By default, Cosmic uses Claude Opus 4.6 for text generation. You can specify a different model by including the model parameter in your request:

// Using Claude
const response = await cosmic.ai.generateText({
  prompt: 'Write a product description',
  model: 'claude-opus-4-6', // Optional: specify model
  max_tokens: 500
})

// Using Gemini 3.1 Pro for agentic workflows
const geminiResponse = await cosmic.ai.generateText({
  prompt: 'Analyze this content and suggest improvements',
  model: 'gemini-3.1-pro-preview', // Use Gemini 3.1 Pro
  max_tokens: 500
})

// Using GPT-5.2 Codex for coding tasks
const codexResponse = await cosmic.ai.generateText({
  prompt: 'Write a React component that displays a blog post',
  model: 'gpt-5.2-codex', // Use GPT-5.2 Codex
  max_tokens: 2000
})

Or using cURL:

# Using Claude
curl https://workers.cosmicjs.com/v3/buckets/${BUCKET_SLUG}/ai/text \
  -d '{"prompt":"Write a product description","model":"claude-opus-4-6","max_tokens":500}' \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer {BUCKET_WRITE_KEY}"

# Using Gemini 3.1 Pro
curl https://workers.cosmicjs.com/v3/buckets/${BUCKET_SLUG}/ai/text \
  -d '{"prompt":"Analyze this content and suggest improvements","model":"gemini-3.1-pro-preview","max_tokens":500}' \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer {BUCKET_WRITE_KEY}"

# Using GPT-5.2 Codex
curl https://workers.cosmicjs.com/v3/buckets/${BUCKET_SLUG}/ai/text \
  -d '{"prompt":"Write a React component that displays a blog post","model":"gpt-5.2-codex","max_tokens":2000}' \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer {BUCKET_WRITE_KEY}"

Image Generation Models

Google Gemini Image Models

Model	API Model ID	Description	Supported Sizes	Aspect Ratios
Gemini 3.1 Flash Image 🍌	`gemini-3.1-flash-image-preview`	Fast, efficient image generation (default, recommended)	512x512, 1024x1024, 2048x2048, 4096x4096	14 ratios
Gemini 3 Pro Image 🍌	`gemini-3-pro-image-preview`	Premium image generation for professional assets	1024x1024, 2048x2048, 4096x4096	10 ratios

Flash Image supported aspect ratios: 1:1, 1:4, 1:8, 2:3, 3:2, 3:4, 4:1, 4:3, 4:5, 5:4, 8:1, 9:16, 16:9, 21:9

Pro Image supported aspect ratios: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9

OpenAI DALL-E Models

Model	API Model ID	Description
DALL-E 3	`dall-e-3`	OpenAI's latest image generation model

Image Generation Model Selection

By default, Cosmic uses Gemini 3.1 Flash Image for image generation. You can explicitly specify the model by including the model parameter in your request.

Gemini image models support reference images for contextual generation. You can provide URLs to existing images, and Gemini will analyze their style, composition, and content to inform the generated image. This is perfect for:

Maintaining consistent visual styles across generated images
Creating variations based on existing artwork
Applying the aesthetic of one image to a new scene
Combining elements from multiple reference images

Flash Image supports up to 14 reference images (10 object + 4 character references), while Pro Image supports up to 5.

// Using Gemini 3.1 Flash Image (default, recommended)
const flashResponse = await cosmic.ai.generateImage({
  prompt: 'A futuristic cityscape at night with neon lights',
  model: 'gemini-3.1-flash-image-preview',
  size: '1024x1024',
  aspect_ratio: '16:9'
})

// Using Flash Image with 512px for quick thumbnails
const thumbResponse = await cosmic.ai.generateImage({
  prompt: 'Product icon for a note-taking app',
  model: 'gemini-3.1-flash-image-preview',
  size: '512x512'
})

// Using Gemini 3 Pro Image for premium 4K generation
const proResponse = await cosmic.ai.generateImage({
  prompt: 'A futuristic cityscape at night with neon lights',
  model: 'gemini-3-pro-image-preview',
  size: '4096x4096'
})

// Using Gemini with reference images for style consistency
const styledResponse = await cosmic.ai.generateImage({
  prompt: 'A mountain landscape in the same artistic style',
  model: 'gemini-3.1-flash-image-preview',
  reference_images: [
    'https://cdn.cosmicjs.com/your-style-reference.jpg'
  ],
  size: '2048x2048'
})

// Using DALL-E 3
const dalleResponse = await cosmic.ai.generateImage({
  prompt: 'A futuristic cityscape at night',
  model: 'dall-e-3',
  size: '1792x1024',
  quality: 'hd'
})

Or using cURL:

# Using Gemini 3.1 Flash Image (default)
curl https://workers.cosmicjs.com/v3/buckets/${BUCKET_SLUG}/ai/image \
  -d '{"prompt":"A futuristic cityscape at night","model":"gemini-3.1-flash-image-preview","size":"1024x1024","aspect_ratio":"16:9"}' \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer {BUCKET_WRITE_KEY}"

# Using Gemini 3 Pro Image for 4K
curl https://workers.cosmicjs.com/v3/buckets/${BUCKET_SLUG}/ai/image \
  -d '{"prompt":"A futuristic cityscape at night","model":"gemini-3-pro-image-preview","size":"4096x4096"}' \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer {BUCKET_WRITE_KEY}"

# Using DALL-E 3
curl https://workers.cosmicjs.com/v3/buckets/${BUCKET_SLUG}/ai/image \
  -d '{"prompt":"A futuristic cityscape at night","model":"dall-e-3","size":"1792x1024","quality":"hd"}' \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer {BUCKET_WRITE_KEY}"

Video Generation Models

Google Veo Models

Model	API Model ID	Description
Veo 3.1 Fast	`veo-3.1-fast-generate-preview`	Faster generation, excellent quality (30-90s)
Veo 3.1 Standard	`veo-3.1-generate-preview`	Premium quality, cinematic realism (60-180s)

Video Generation Model Selection

By default, Cosmic uses Veo 3.1 Fast for video generation. Both models support:

Native Audio: Automatically generated audio that matches the scene
Flexible Durations: 4, 6, or 8 seconds
High Resolution: 720p or 1080p
Image-to-Video Mode: Use 1 reference image as the starting frame

Veo 3.1 Fast is recommended for most use cases, offering excellent quality with faster generation times. Use Veo 3.1 Standard for premium marketing content and cinematic quality.

// Using Veo Fast (recommended)
const fastVideo = await cosmic.ai.generateVideo({
  prompt: 'A peaceful zen garden with koi fish swimming',
  model: 'veo-3.1-fast-generate-preview',
  duration: 8,
  resolution: '720p'
})

// Using Veo Standard for premium quality
const premiumVideo = await cosmic.ai.generateVideo({
  prompt: 'Cinematic shot of city skyline at golden hour',
  model: 'veo-3.1-generate-preview',
  duration: 8,
  resolution: '1080p'
})

// Using reference image as starting frame (image-to-video)
const styledVideo = await cosmic.ai.generateVideo({
  prompt: 'Product rotates smoothly revealing all angles',
  model: 'veo-3.1-fast-generate-preview',
  duration: 6,
  reference_images: [
    'https://cdn.cosmicjs.com/product-hero.jpg'
  ]
})

Or using cURL:

# Using Veo Fast
curl https://workers.cosmicjs.com/v3/buckets/${BUCKET_SLUG}/ai/video \
  -d '{"prompt":"A peaceful zen garden with koi fish","model":"veo-3.1-fast-generate-preview","duration":8}' \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer {BUCKET_WRITE_KEY}"

# Using Veo Standard for premium quality
curl https://workers.cosmicjs.com/v3/buckets/${BUCKET_SLUG}/ai/video \
  -d '{"prompt":"Cinematic city skyline at golden hour","model":"veo-3.1-generate-preview","duration":8,"resolution":"1080p"}' \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer {BUCKET_WRITE_KEY}"

Audio Generation Models

OpenAI TTS Models

Model	API Model ID	Description	Voices
TTS Standard	`tts-1`	Fast, low-latency text-to-speech (recommended)	9
TTS HD	`tts-1-hd`	Higher quality audio (2x cost)	9

Available Voices

Feminine voices

Voice	Description	Best for
nova	Warm and bright, great for friendly narration (default)	Podcasts, explainers
shimmer	Soft and intimate, gentle delivery	Meditation, ASMR, bedtime stories
coral	Clear and polished, professional tone	Product demos, business content
sage	Calm and steady, thoughtful pacing	Education, tutorials
alloy	Neutral and balanced, versatile default	General purpose, articles

Masculine voices

Voice	Description	Best for
echo	Deep and authoritative, confident delivery	Announcements, trailers, news
onyx	Bold and commanding, strong presence	Intros, branding, dramatic reads
fable	Animated and expressive, natural storyteller	Storytelling, audiobooks
ash	Warm and approachable, conversational	Conversational, interviews

Audio Generation Model Selection

By default, Cosmic uses tts-1 with the nova voice. Texts longer than 4,096 characters are automatically split at paragraph boundaries and concatenated into a single audio file.

// Using default voice (nova)
const audio = await cosmic.ai.generateAudio({
  prompt: 'Welcome to our product walkthrough.',
})

// Using a specific voice and HD quality
const hdAudio = await cosmic.ai.generateAudio({
  prompt: articleContent,
  voice: 'sage',
  model: 'tts-1-hd',
  folder: 'podcasts'
})

Or using cURL:

# Using default settings
curl https://workers.cosmicjs.com/v3/buckets/${BUCKET_SLUG}/ai/audio \
  -d '{"prompt":"Welcome to our product walkthrough."}' \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer {BUCKET_WRITE_KEY}"

# Using HD model with specific voice
curl https://workers.cosmicjs.com/v3/buckets/${BUCKET_SLUG}/ai/audio \
  -d '{"prompt":"Your article text here...","voice":"sage","model":"tts-1-hd"}' \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer {BUCKET_WRITE_KEY}"

Using Reference Images with Video Generation (Image-to-Video Mode)

Veo 3.1 supports image-to-video mode, where you provide 1 reference image that becomes the first frame of your video. This ensures precise control over the starting appearance, composition, and style.

What Image-to-Video Does:

Precise Starting Frame: Your reference image becomes the exact first frame of the video
Product Accuracy: Start with your exact product photo and animate from there
Character Consistency: Begin with a specific character pose or appearance
Brand Control: Ensure videos start with approved brand imagery

Best Practices:

// Product showcase starting from hero image
const productVideo = await cosmic.ai.generateVideo({
  prompt: 'Product rotates smoothly, revealing all angles with studio lighting',
  reference_images: [
    'https://cdn.cosmicjs.com/product-hero-front.jpg'
  ],
  duration: 8
})

// Character animation from specific pose
const characterVideo = await cosmic.ai.generateVideo({
  prompt: 'Character waves and smiles at camera with friendly expression',
  reference_images: [
    'https://cdn.cosmicjs.com/character-neutral-pose.jpg'
  ],
  duration: 6
})

// Brand scene animation
const brandVideo = await cosmic.ai.generateVideo({
  prompt: 'Camera slowly zooms in while maintaining brand aesthetic',
  reference_images: [
    'https://cdn.cosmicjs.com/brand-scene-wide.jpg'
  ],
  duration: 8
})

Guidelines:

Use high-resolution images (1024x1024+ recommended)
Only 1 reference image supported (becomes the first frame)
Ensure URL is publicly accessible
Your prompt should describe the motion/animation starting from that image
Think of it as "how should this image come to life?"

Advanced Veo 3.1 Capabilities

According to Google's Veo 3.1 documentation, Veo 3.1 supports several advanced features:

✅ Image-to-Video Mode

Use 1 reference image as the first frame of your video. Veo animates from this starting point with precise control over initial appearance. See examples above.

✅ Video Extension

Extend previously generated videos to create longer content. See the Extend Video section below for full documentation.

POST/v3/buckets/:bucket_slug/ai/video/extend

Extend Video

This endpoint enables you to extend a previously generated Veo video by creating a new 8-second clip that continues from the final second of the original video.

Video extension creates a new video that maintains visual continuity with the original. Only videos generated with Veo can be extended (they must have a veo_file_uri in their metadata). Extension is always 8 seconds at 720p resolution.

Required parameters

Name
prompt
Type
string
Description
A text description of how to continue the video. Should describe the motion/action that follows the original video.
Name
media_id
Type
string
Description
The ID of the original Veo-generated video to extend. The video must have been generated using the Generate Video endpoint.

Optional parameters

Name
model
Type
string
Description
The video generation model to use. Options: veo-3.1-fast-generate-preview (recommended) or veo-3.1-generate-preview. Defaults to veo-3.1-fast-generate-preview.
Name
folder
Type
string
Description
Media folder to store the extended video in. (Folder must exist)
Name
metadata
Type
object
Description
User-added JSON metadata for the extended video.

Limitations

Duration: Extensions are always 8 seconds
Resolution: Extensions are always 720p (even if the original was 1080p)
Source: Only Veo-generated videos can be extended (must have veo_file_uri in metadata)
Chaining: Extended videos can also be extended, enabling creation of longer narratives

Request Examples

import { createBucketClient } from '@cosmicjs/sdk'

const cosmic = createBucketClient({
  bucketSlug: 'BUCKET_SLUG',
  readKey: 'BUCKET_READ_KEY',
  writeKey: 'BUCKET_WRITE_KEY'
})

// First, generate an initial video
const initialVideo = await cosmic.ai.generateVideo({
  prompt: 'A calico kitten sitting peacefully in golden sunlight',
  duration: 8,
  resolution: '720p'
})

console.log('Initial video ID:', initialVideo.media.id)

// Then extend it with a continuation
const extendedVideo = await cosmic.ai.extendVideo({
  media_id: initialVideo.media.id,
  prompt: 'The kitten stands up and walks away into the garden'
})

console.log('Extended video URL:', extendedVideo.media.url)
console.log('Source video ID:', extendedVideo.source_media_id)

Response Example

{
  "media": {
    "id": "65f8b4c3d5e6f7g8h9i0j1k2",
    "name": "veo-ext-a1b2c3d4-e5f6-g7h8-i9j0-k1l2m3n4o5p6.mp4",
    "original_name": "veo-ext-a1b2c3d4-e5f6-g7h8-i9j0-k1l2m3n4o5p6.mp4",
    "size": 8650000,
    "type": "video/mp4",
    "bucket": "65a1b2c3d4e5f6g7h8i9j0k1",
    "created_at": "2025-12-20T10:35:00.000Z",
    "folder": null,
    "url": "https://cdn.cosmicjs.com/veo-ext-xxx.mp4",
    "imgix_url": "https://imgix.cosmicjs.com/veo-ext-xxx.mp4",
    "alt_text": "The scene continues with the character walking away",
    "metadata": {
      "duration": 8,
      "resolution": "720p",
      "generation_time_seconds": 52,
      "is_extension": true,
      "source_media_id": "65f8a3b2c4d5e6f7g8h9i0j1",
      "veo_file_uri": "https://generativelanguage.googleapis.com/v1beta/files/..."
    }
  },
  "usage": {
    "input_tokens": 288000,
    "output_tokens": 0,
    "total_tokens": 288000
  },
  "generation_time_seconds": 52,
  "source_media_id": "65f8a3b2c4d5e6f7g8h9i0j1",
  "is_extension": true
}

Video Extension Use Cases

Creating Longer Narratives

Chain multiple 8-second extensions to create minute-long videos or longer stories:

// Build a 32-second video (4 segments)
let currentVideo = await cosmic.ai.generateVideo({
  prompt: 'Opening scene: sunrise over mountains',
  duration: 8
})

const segments = [currentVideo]

for (const continuation of [
  'Birds take flight from the trees below',
  'A hiker appears on the trail in the distance',
  'Close-up of the hiker reaching the summit'
]) {
  currentVideo = await cosmic.ai.extendVideo({
    media_id: currentVideo.media.id,
    prompt: continuation
  })
  segments.push(currentVideo)
}

console.log(`Created ${segments.length} connected segments (${segments.length * 8}s total)`)

Extending Product Showcases

Continue product demonstrations with smooth transitions:

const productIntro = await cosmic.ai.generateVideo({
  prompt: 'Product slowly rotates on white background',
  reference_images: ['https://cdn.cosmicjs.com/product.jpg'],
  duration: 8
})

const productDetails = await cosmic.ai.extendVideo({
  media_id: productIntro.media.id,
  prompt: 'Camera zooms in to show product texture and details'
})

Building Background Loops

Create longer ambient footage for websites or presentations:

const ambientBase = await cosmic.ai.generateVideo({
  prompt: 'Gentle rain falling on window with city lights blurred behind',
  duration: 8
})

const ambientExtended = await cosmic.ai.extendVideo({
  media_id: ambientBase.media.id,
  prompt: 'Rain continues with occasional lightning flash in the distance'
})

POST/v3/buckets/:bucket_slug/ai/audio

Generate Audio

This endpoint converts text to natural-sounding speech using OpenAI's text-to-speech models. The generated audio is automatically uploaded to your Cosmic media library as an MP3 file.

The data request payload will need to be sent in JSON format with the Content-Type: application/json header set. Texts longer than 4,096 characters are automatically chunked at paragraph boundaries and concatenated into a single audio file.

Required parameters

Name
prompt
Type
string
Description
The text to convert to speech. Can be any length; texts over 4,096 characters are automatically split and processed in chunks.

Optional parameters

Name
voice
Type
string
Description
Voice to use for speech generation. Feminine: nova (default), shimmer, coral, sage, alloy. Masculine: echo, onyx, fable, ash.
Name
model
Type
string
Description
TTS model to use: tts-1 (default, optimized for speed) or tts-1-hd (higher quality, 2x cost).
Name
folder
Type
string
Description
Media folder to store the generated audio in. (Folder must exist)
Name
metadata
Type
object
Description
User-added JSON metadata for the audio file.

Request Examples

import { createBucketClient } from '@cosmicjs/sdk'

const cosmic = createBucketClient({
  bucketSlug: 'BUCKET_SLUG',
  readKey: 'BUCKET_READ_KEY',
  writeKey: 'BUCKET_WRITE_KEY'
})

const audio = await cosmic.ai.generateAudio({
  prompt: 'Welcome to our podcast. Today we explore the future of AI-powered content management.',
  voice: 'nova'
})

Response Example

{
  "media": {
    "id": "65f8c5d4e6f7g8h9i0j1k2l3",
    "name": "tts-a1b2c3d4-e5f6-g7h8-i9j0-k1l2m3n4o5p6.mp3",
    "original_name": "tts-a1b2c3d4-e5f6-g7h8-i9j0-k1l2m3n4o5p6.mp3",
    "size": 245000,
    "type": "audio/mpeg",
    "bucket": "65a1b2c3d4e5f6g7h8i9j0k1",
    "created_at": "2026-02-21T15:30:00.000Z",
    "folder": null,
    "url": "https://cdn.cosmicjs.com/tts-xxx.mp3",
    "imgix_url": "https://imgix.cosmicjs.com/tts-xxx.mp3",
    "alt_text": "Welcome to our podcast...",
    "metadata": {
      "prompt": "Welcome to our podcast...",
      "model": "tts-1",
      "voice": "nova",
      "duration_estimate": "1 min",
      "char_count": 82,
      "chunk_count": 1
    }
  },
  "usage": {
    "input_tokens": 21,
    "output_tokens": 3600,
    "total_tokens": 3621
  }
}

Usage and Limitations

AI capabilities in Cosmic are subject to the following considerations:

Rate Limits: AI generation requests may be subject to rate limiting based on your plan.
Token Usage: Text, image, video, and audio generation consume tokens. Media generation (images, videos, and audio) has higher token costs due to computational requirements.
Media Storage: Generated images and videos are stored in your Cosmic Bucket's media library and count toward your storage quota.
Video Generation Time: Video generation is asynchronous and typically takes 30-90 seconds (Fast) or 60-180 seconds (Standard).
Content Policy: Generated content must comply with Cosmic's terms of service and content policies. Video generation will fail without a specific error message if the prompt violates content policies (e.g., celebrity likeness, copyrighted material, or safety concerns). If you receive a "No videos generated" error, try rephrasing your prompt to avoid references to real people, brands, or copyrighted characters.
Regional Restrictions: In EU, UK, CH, MENA regions, person generation has limitations (see Google's documentation for details).
Watermarking: Videos created by Veo are watermarked using SynthID for AI-generated content identification.

Pricing Overview

All AI features consume tokens from your monthly allocation or token packs. Token costs vary by model complexity and media type.

Text Generation Pricing

Text generation uses tiered pricing based on model capability. More powerful models cost more tokens to use, reflecting their higher computational requirements and superior performance.

How Token Multipliers Work: Your actual token deduction is multiplied by the tier multiplier:

Tier	Multiplier	Models	Token Deduction
Budget	1.0x	GPT-5 Nano, GPT-4.1 Nano, GPT-4o Mini, GPT-5 Mini, GPT-4.1 Mini, Claude Haiku 3, Claude Haiku 4.5	1,000 actual tokens = 1,000 deducted
Standard	2.0x	GPT-5, GPT-5.2, GPT-4.1, GPT-4o, o1-mini, o3, o3-mini, o4-mini, Claude Sonnet 4, Claude Sonnet 4.5, Claude Opus 4.5, Claude Opus 4.6, Gemini 3 Pro	1,000 actual tokens = 2,000 deducted
Premium	4.0x	Claude Opus 4, Claude Opus 4.1, o1, o3-pro	1,000 actual tokens = 4,000 deducted

Example: Using Claude Opus 4.6 (Standard tier) with a 1,000 token response will deduct 2,000 tokens from your balance (1,000 × 2.0x).

Media Generation Pricing

Images and videos use fixed token costs. All media generation costs are billed as output tokens.

Feature	Token Cost (Output)
DALL-E 3 Image	4,800 tokens
Gemini 1K/2K Image	32,160 tokens
Gemini 4K Image	57,600 tokens
Veo Fast Video (4s)	144,000 tokens
Veo Fast Video (6s)	216,000 tokens
Veo Fast Video (8s)	288,000 tokens
Veo Fast Extension	288,000 tokens
Veo Standard Video (4s)	384,000 tokens
Veo Standard Video (6s)	576,000 tokens
Veo Standard Video (8s)	768,000 tokens
Veo Standard Extension	768,000 tokens
TTS Audio (per 1K chars)	3,600 tokens
TTS HD Audio (per 1K chars)	7,200 tokens

Note: Input tokens (your prompt) for media generation are minimal compared to the output cost. Audio generation tokens scale proportionally with text length. The token costs listed above represent the generation cost and are deducted from your output token allocation.

For more information about AI capabilities and pricing, please refer to the Cosmic pricing page or contact Cosmic support.

AI

Base URL

Generate Text

Required parameters

Optional parameters

Request Examples

Media Analysis Examples

cURL Examples

Response Examples

Streaming Capabilities

Using with the SDK

Using the simplified stream method

Using cURL for streaming

Generate Image

Required parameters

Optional parameters

Request Examples

Response Example

Generate Video

Required parameters

Optional parameters

Request Examples

Response Example

Available Models

Text Generation Models

Anthropic Claude Models

Google Gemini Models

OpenAI GPT Models

Text Generation Model Selection

Image Generation Models

Google Gemini Image Models

OpenAI DALL-E Models

Image Generation Model Selection

Video Generation Models

Google Veo Models

Video Generation Model Selection

Audio Generation Models

OpenAI TTS Models

Available Voices

Audio Generation Model Selection

Using Reference Images with Video Generation (Image-to-Video Mode)

Advanced Veo 3.1 Capabilities

✅ Image-to-Video Mode

✅ Video Extension

Extend Video

Required parameters

Optional parameters

Limitations

Request Examples

Response Example

Video Extension Use Cases

Generate Audio

Required parameters

Optional parameters

Request Examples

Response Example

Usage and Limitations

Pricing Overview

Text Generation Pricing

Media Generation Pricing