Async transcription
Upload audio of any duration. AWS Batch spins up an EC2 g4dn.xlarge Spot GPU, runs Whisper-large-v3 at fp16, and returns a transcript in pennies per audio hour. Step Functions fan-out chunks anything over ten minutes.
Panakoes is the open-source backend for capturing audio at any duration, transcribing it on GPU, and surfacing AI-powered summaries and action items. Built on AWS Fargate, Step Functions, and Whisper. MIT licensed.
Upload audio of any duration. AWS Batch spins up an EC2 g4dn.xlarge Spot GPU, runs Whisper-large-v3 at fp16, and returns a transcript in pennies per audio hour. Step Functions fan-out chunks anything over ten minutes.
Session-spawned g4dn.xlarge on a custom AMI runs faster-whisper-large with Silero VAD, streaming partial transcripts over a WebSocket. Sub-second latency once the instance is warm.
Claude Haiku 4.5 handles standard summaries on the free tier; Claude Sonnet 4.6 powers the paid "deep summary" feature. Pluggable model interface so the next SOTA model swaps in without a rewrite.
Two parallel paths share one pluggable transcription abstraction. Async runs through S3, Lambda, AWS Batch, and DynamoDB streams. Streaming runs through API Gateway WebSocket, a session manager, and per-session GPU instances. Observability via CloudWatch and X-Ray with OpenTelemetry instrumentation throughout.
Public on GitHub. Issues, discussions, and pull requests welcome. MIT license.
github.com/Aztec03hub/panakoesDetailed write-up of services, data flow, and AWS resource map. Plus ADRs for every locked decision.
Read the architecturePanakoes is the first open-source project from LaFayette Labs LLC, a one-principal engineering studio in Chicago.
lafayettelabs.com