Skip to content

doctorita-transcribe API

HTTP, WebSocket, and Server-Sent Events API for live PCM transcription, transcript snapshots, audio retrieval, and speaker labeling.

This documentation is written for API consumers. It describes routes, request payloads, response schemas, streaming protocols, authentication rules, validation behavior, and error responses.

  • Endpoint contract

    Public /v1/* routes, protected integration routes, request bodies, query parameters, and status codes.

    Endpoint index

  • Streaming protocols

    WebSocket PCM ingest plus transcript, inspect, and log Server-Sent Events streams.

    Live audio WebSocket

  • Schemas

    Transcript snapshots, words, segments, ASR config, speaker profiles, log entries, and inspect events.

    Schema reference

API Overview

Area API surface
Health GET /healthz
Session lifecycle POST /v1/sessions, POST /v1/sessions/{session_id}/stop
Live audio GET /v1/sessions/{session_id}/audio/ws
Transcript access GET /v1/sessions/{session_id}/transcript, GET /v1/sessions/{session_id}/events
Audio retrieval GET /v1/sessions/{session_id}/recording, GET /v1/sessions/{session_id}/inspect/audio
Diagnostics GET /v1/sessions/{session_id}/inspect/events, GET /v1/logs/events
ASR tuning GET /v1/sessions/{session_id}/asr-config, PATCH /v1/sessions/{session_id}/asr-config
Speakers GET /v1/speakers, POST /v1/speakers, POST /v1/sessions/{session_id}/speaker-identification
Batch comparison POST /v1/transcriptions/elevenlabs
Protected integration routes /api/v1/transcribe-live/sessions/{session_id}/*

Live Session Flow

  1. Create a session with POST /v1/sessions.
  2. Subscribe to transcript events with GET /v1/sessions/{session_id}/events.
  3. Open GET /v1/sessions/{session_id}/audio/ws.
  4. Send the required JSON start frame.
  5. Stream 16 kHz mono PCM16 little-endian binary audio frames.
  6. Close the WebSocket when capture ends.
  7. Stop the session with POST /v1/sessions/{session_id}/stop.
  8. Fetch the final transcript with GET /v1/sessions/{session_id}/transcript?consistency=FINAL.
  9. Optionally run speaker identification or retrieve the session recording.

Start Here

Use the API reference for the complete contract. It includes copyable examples, stream event formats, validation rules, authentication details for integration routes, and all shared JSON schemas.