In today’s fast-paced digital world, real-time audio transcription is a game-changer for applications ranging from live captioning to voice-driven interfaces. Deepgram, a leading speech-to-text API provider, uses advanced AI models like Nova-3 to deliver highly accurate and low-latency transcription services. This article guides you through setting up a Node.js project to integrate Deepgram’s real-time transcription capabilities, leveraging your existing setup with index.html , a JavaScript file, and dependencies (@deepgram/sdk, dotenv, ws) . We’ll cover account setup, API key creation, project configuration, code implementation, and the pros and cons of using Deepgram, emphasizing its $200 free credit offer.
Before integrating Deepgram, you need an account and an API key.
DEEPGRAM_API_KEY=your_api_key_here
Your Node.js project already includes index.html , a JavaScript file, and the necessary dependencies. Let’s ensure everything is configured correctly.
npm install @deepgram/sdk dotenv ws
Your package.json should include:
"dependencies": {
"@deepgram/sdk": "^3.12.1",
"dotenv": "^16.5.0",
"ws": "^8.18.1"
}
In your JavaScript file, load the .env file using dotenv:
require('dotenv').config();
Ensure your project has the following structure:
project-root/
├── index.html
├── script.js
├── .env
├── package.json
└── node_modules/
With your project set up, you can now integrate Deepgram’s SDK to enable real-time audio transcription..
In script.js, import and initialize the Deepgram client:
const { createClient } = require('@deepgram/sdk');
const deepgram = createClient(process.env.DEEPGRAM_API_KEY);
Create a live transcription instance using the Nova-3 model:
const live = deepgram.listen.live({ model: 'nova-3' });
const WebSocket = require('ws');
const ws = new WebSocket('ws://your-websocket-server');
ws.on('message', (message) => {
live.send(message); // Send audio data to Deepgram
});
Update index.html to include this code block.
<!DOCTYPE html>
<html>
<head>
<title>Live Mic Transcription</title>
</head>
<body>
<h1>🎙️ Speak and See Live Transcription</h1>
<pre id="transcript">Waiting for transcription...</pre>
<script>
const transcriptEl = document.getElementById('transcript');
const ws = new WebSocket('ws://localhost:8080');
navigator.mediaDevices.getUserMedia({ audio: true })
.then(stream => {
const mediaRecorder = new MediaRecorder(stream, {
mimeType: 'audio/webm'
});
mediaRecorder.addEventListener("dataavailable", event => {
if (event.data.size > 0 && ws.readyState === WebSocket.OPEN) {
ws.send(event.data);
}
});
mediaRecorder.start(250);
});
ws.onmessage = (msg) => {
const data = JSON.parse(msg.data);
if (data.transcript) {
transcriptEl.textContent += "
" + data.transcript;
}
};
</script>
</body>
</html>
In script.js, add audio capture logic:
require("dotenv").config();
const WebSocket = require("ws");
const { createClient, LiveTranscriptionEvents } = require("@deepgram/sdk");
const wss = new WebSocket.Server({ port: 8080 });
console.log("starting the server...");
wss.on("connection", (ws) => {
const deepgram = createClient(process.env.DEEPGRAM_API_KEY);
const dgConnection = deepgram.listen.live({
model: "nova-2-medical",
language: "en",
smart_format: true,
measurements:true,
numerals:true,
});
dgConnection.on(LiveTranscriptionEvents.Open, () => {
console.log("Deepgram connection opened.");
});
dgConnection.on(LiveTranscriptionEvents.Transcript, (data) => {
const transcript = data.channel.alternatives[0].transcript;
if (transcript) {
ws.send(JSON.stringify({ transcript }));
}
});
dgConnection.on(LiveTranscriptionEvents.Error, console.error);
ws.on("message", (msg) => {
dgConnection.send(msg);
});
ws.on("close", () => {
dgConnection.finish();
console.log("WebSocket closed");
});
});
Deepgram offers several advantages but also has some limitations.
Deepgram’s speech-to-text API empowers developers to build real-time transcription features with ease. Its high accuracy, generous free credit, and robust SDK make it an excellent choice for Node.js projects. By following this guide, you can set up Deepgram, stream audio, and process transcriptions seamlessly. Explore Deepgram’s documentation for advanced features and start building innovative voice-driven applications today.
Happy coding!
If you enjoyed this post, consider sharing it on social media. For more articles on JavaScript and modern web development,follow my Medium, Stack Overflow and check out my GitHub repositories.