whisper
Model ID: @cf/openai/whisper
Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.
Properties
Task Type: Automatic Speech Recognition
Code Examples
Workers - Typescript
export interface Env { AI: Ai;}
export default { async fetch(request, env): Promise<Response> { const res = await fetch( "https://github.com/Azure-Samples/cognitive-services-speech-sdk/raw/master/samples/cpp/windows/console/samples/enrollment_audio_katie.wav" ); const blob = await res.arrayBuffer();
const input = { audio: [...new Uint8Array(blob)], };
const response = await env.AI.run( "@cf/openai/whisper", input );
return Response.json({ input: { audio: [] }, response }); },} satisfies ExportedHandler<Env>;
curl
curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run/@cf/openai/whisper \ -X POST \ -H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \ --data-binary "@talking-llama.mp3"
Response
Automatic speech recognition responses return both a single string text
property with the audio transcription and an optional array of words
with start and end timestamps if the model supports that.
{ "text": "It is a good day", "word_count": 5, "words": [ { "word": "It", "start": 0.5600000023841858, "end": 1 }, { "word": "is", "start": 1, "end": 1.100000023841858 }, { "word": "a", "start": 1.100000023841858, "end": 1.2200000286102295 }, { "word": "good", "start": 1.2200000286102295, "end": 1.3200000524520874 }, { "word": "day", "start": 1.3200000524520874, "end": 1.4600000381469727 } ]}
API Schema
The following schema is based on JSON Schema
Input JSON Schema
{ "oneOf": [ { "type": "string", "format": "binary" }, { "type": "object", "properties": { "audio": { "type": "array", "items": { "type": "number" } } }, "required": [ "audio" ] } ]}
Output JSON Schema
{ "type": "object", "contentType": "application/json", "properties": { "text": { "type": "string" }, "word_count": { "type": "number" }, "words": { "type": "array", "items": { "type": "object", "properties": { "word": { "type": "string" }, "start": { "type": "number" }, "end": { "type": "number" } } } }, "vtt": { "type": "string" } }, "required": [ "text" ]}