PANAKOES ALL-HEARING / CLOUD AUDIO

00. Project bulletin / open-source release v0.1 / pre-alpha

Cloud audio capture, transcription, and AI insights.

Panakoes is the open-source backend for capturing audio at any duration, transcribing it on GPU, and surfacing AI-powered summaries and action items. Built on AWS Fargate, Step Functions, and Whisper. MIT licensed.


01. Capabilities / what it does

Three audio paths, one backend.

A1 / async

Async transcription

Upload audio of any duration. AWS Batch spins up an EC2 g4dn.xlarge Spot GPU, runs Whisper-large-v3 at fp16, and returns a transcript in pennies per audio hour. Step Functions fan-out chunks anything over ten minutes.

B1 / live

Live streaming

Session-spawned g4dn.xlarge on a custom AMI runs faster-whisper-large with Silero VAD, streaming partial transcripts over a WebSocket. Sub-second latency once the instance is warm.

C1 / summarize

AI summarization

Claude Haiku 4.5 handles standard summaries on the free tier; Claude Sonnet 4.6 powers the paid "deep summary" feature. Pluggable model interface so the next SOTA model swaps in without a rewrite.


02. Architecture / event-driven on AWS

An event-driven backbone, audit-ready by default.

Two parallel paths share one pluggable transcription abstraction. Async runs through S3, Lambda, AWS Batch, and DynamoDB streams. Streaming runs through API Gateway WebSocket, a session manager, and per-session GPU instances. Observability via CloudWatch and X-Ray with OpenTelemetry instrumentation throughout.

Full architecture write-up

Simplified Panakoes architecture: client uploads to S3 or opens a WebSocket; async path runs Lambda then AWS Batch GPU then Claude summarizer; streaming path runs session manager then per-session GPU instance; both paths land in RDS metadata and emit OpenTelemetry traces to CloudWatch.
Simplified flow. Detailed diagram in the architecture doc.

03. Get involved / open source