Background Overlay
Speech-to-Text Integration

Real-Time Audio Transcription with Deepgram’s Speech-to-Text API

Deepgram Transcription

In today’s fast-paced digital world, real-time audio transcription is a game-changer for applications ranging from live captioning to voice-driven interfaces. Deepgram, a leading speech-to-text API provider, uses advanced AI models like Nova-3 to deliver highly accurate and low-latency transcription services. This article guides you through setting up a Node.js project to integrate Deepgram’s real-time transcription capabilities, leveraging your existing setup with index.html , a JavaScript file, and dependencies (@deepgram/sdk, dotenv, ws) . We’ll cover account setup, API key creation, project configuration, code implementation, and the pros and cons of using Deepgram, emphasizing its $200 free credit offer.

Step 1: Sign Up for Deepgram and Create an API Key

Before integrating Deepgram, you need an account and an API key.

Create a Deepgram Account

  • Visit Deepgram’s website and sign up for a free account.
  • Upon registration, you receive $200 in free credit, equivalent to approximately 45,000 minutes of transcription, with no credit card required. This makes it ideal for testing and development.
  • Create a new API key, copy it, and store it in a .env file as DEEPGRAM_API_KEY.

Create an API Key

  • Log in to the Deepgram Console.
  • From the “Projects” drop-down menu on the top-left, select your project (a default project is created upon signup).
  • Navigate to Settings > API Keys.
  • Click Create a New API Key.
  • Enter a friendly name (e.g., Node Transcription Key), choose permissions (e.g., usage:write for transcription), set an expiration date if desired, and add optional tags for organization.
  • Copy the API key and store it securely, as it cannot be viewed again after creation.

Store the API Key

  • Create a .env file in your project’s root directory.
  • Add the API key:
  • DEEPGRAM_API_KEY=your_api_key_here
    

Step 2: Set Up Your Node.js Project

Your Node.js project already includes index.html , a JavaScript file, and the necessary dependencies. Let’s ensure everything is configured correctly.

Install Dependencies

npm install @deepgram/sdk dotenv ws

Your package.json should include:

"dependencies": {
  "@deepgram/sdk": "^3.12.1",
  "dotenv": "^16.5.0",
  "ws": "^8.18.1"
}

Configure Environment Variables

In your JavaScript file, load the .env file using dotenv:

require('dotenv').config();

Project Structure

Ensure your project has the following structure:

project-root/
├── index.html
├── script.js
├── .env
├── package.json
└── node_modules/

Step 3: Integrate Deepgram SDK for Real-Time Transcription

With your project set up, you can now integrate Deepgram’s SDK to enable real-time audio transcription..

Initialize the Deepgram Client

In script.js, import and initialize the Deepgram client:

const { createClient } = require('@deepgram/sdk');
const deepgram = createClient(process.env.DEEPGRAM_API_KEY);

Set Up Live Transcription

Create a live transcription instance using the Nova-3 model:

const live = deepgram.listen.live({ model: 'nova-3' });

Stream Audio via WebSocket

const WebSocket = require('ws');
const ws = new WebSocket('ws://your-websocket-server');

ws.on('message', (message) => {
  live.send(message); // Send audio data to Deepgram
});

Basic HTML Interface

Update index.html to include this code block.


<!DOCTYPE html>
<html>
<head>
  <title>Live Mic Transcription</title>
</head>
<body>
  <h1>🎙️ Speak and See Live Transcription</h1>
  <pre id="transcript">Waiting for transcription...</pre>

  <script>
    const transcriptEl = document.getElementById('transcript');
    const ws = new WebSocket('ws://localhost:8080');

    navigator.mediaDevices.getUserMedia({ audio: true })
      .then(stream => {
        const mediaRecorder = new MediaRecorder(stream, {
          mimeType: 'audio/webm'
        });

        mediaRecorder.addEventListener("dataavailable", event => {
          if (event.data.size > 0 && ws.readyState === WebSocket.OPEN) {
            ws.send(event.data);
          }
        });

        mediaRecorder.start(250);
      });

    ws.onmessage = (msg) => {
      const data = JSON.parse(msg.data);
      if (data.transcript) {
        transcriptEl.textContent += "
" + data.transcript;
      }
    };
  </script>
</body>
</html>
  

In script.js, add audio capture logic:

require("dotenv").config();
const WebSocket = require("ws");
const { createClient, LiveTranscriptionEvents } = require("@deepgram/sdk");

const wss = new WebSocket.Server({ port: 8080 });
console.log("starting the server...");

wss.on("connection", (ws) => {
  const deepgram = createClient(process.env.DEEPGRAM_API_KEY);
  

  const dgConnection = deepgram.listen.live({
    model: "nova-2-medical",
    language: "en",
    smart_format: true,
    measurements:true,
    numerals:true,
  });

  dgConnection.on(LiveTranscriptionEvents.Open, () => {
    console.log("Deepgram connection opened.");
  });

  dgConnection.on(LiveTranscriptionEvents.Transcript, (data) => {
    const transcript = data.channel.alternatives[0].transcript;
    if (transcript) {
      ws.send(JSON.stringify({ transcript }));
    }
  });

  dgConnection.on(LiveTranscriptionEvents.Error, console.error);

  ws.on("message", (msg) => {
    dgConnection.send(msg);
  });

  ws.on("close", () => {
    dgConnection.finish();
    console.log("WebSocket closed");
  });
});


Pros and Cons of Deepgram

Deepgram offers several advantages but also has some limitations.

Pros

  • High Accuracy: Nova-3 model delivers up to 30% better accuracy than competitors.
  • Free Credit: $200 free credit (approx. 45,000 minutes) with no credit card required.
  • Real-Time: Supports live transcription for dynamic applications.
  • Scalability: Suitable for both small projects and enterprise use.

Cons

  • Cost: Pay-as-you-go pricing can add up after free credit is used.
  • Internet Dependency: Requires stable connectivity for real-time transcription.
  • Learning Curve: Streaming audio setup may be complex for beginners.

Conclusion

Deepgram’s speech-to-text API empowers developers to build real-time transcription features with ease. Its high accuracy, generous free credit, and robust SDK make it an excellent choice for Node.js projects. By following this guide, you can set up Deepgram, stream audio, and process transcriptions seamlessly. Explore Deepgram’s documentation for advanced features and start building innovative voice-driven applications today.

Happy coding!


If you enjoyed this post, consider sharing it on social media. For more articles on JavaScript and modern web development,follow my Medium, Stack Overflow and check out my GitHub repositories.