Transcribing audio with Whisper from OpenAI

Whisper is an advanced speech recognition system developed by OpenAI. While it's not an off-the-shelf solution, it offers powerful transcription capabilities for those with technical skills. Here's a guide for those happy to dive into the technical bits - if you're not technical, have a look at RambleFix which is a fast, accurate off the shelf solution.

Prerequisites

  1. Python Environment: Ensure you have Python installed. You can download it from the official Python website.
  2. Install pip: Ensure you have pip, the Python package installer. It usually comes with Python, but you can install it by running python -m ensurepip --upgrade.

1. Install Whisper

  1. Open your terminal or command prompt.
  2. Install Whisper using pip:
    pip install git+https://github.com/openai/whisper.git

2. Install Dependencies

Whisper relies on some other libraries. Install them by running:

pip install numpy torch

3. Download the Whisper Model

Whisper has different models (tiny, base, small, medium, large) with varying levels of accuracy and speed. Choose the model based on your requirements. For example, to use the "base" model, you can specify it during transcription.

4. Prepare Your Audio File

Ensure your audio file is in a format supported by Whisper (e.g., MP3, WAV, M4A). Place the audio file in an accessible directory.

5. Transcribe the Audio File

Use a Python script to transcribe the audio file. Here's a simple example:

  1. Open a text editor and create a new Python script, e.g., transcribe.py.
  2. Add the following code to the script:
    import whisper
    
    # Load the Whisper model
    model = whisper.load_model("base")
    
    # Transcribe the audio file
    result = model.transcribe("path/to/your/audio/file.mp3")
    
    # Print the transcription
    print(result["text"])
  3. Replace "path/to/your/audio/file.mp3" with the actual path to your audio file.
  4. Save and close the script.

6. Run the Transcription Script

  1. In the terminal or command prompt, navigate to the directory where you saved transcribe.py.
  2. Run the script using Python:
    python transcribe.py

7. View the Output

The transcription will be printed to the terminal. You can modify the script to save the output to a file if needed:

import whisper

# Load the Whisper model
model = whisper.load_model("base")

# Transcribe the audio file
result = model.transcribe("path/to/your/audio/file.mp3")

# Save the transcription to a text file
with open("transcription.txt", "w") as f:
    f.write(result["text"])