Transcribing audio with Whisper from OpenAI
Whisper is an advanced speech recognition system developed by OpenAI. While it's not an off-the-shelf solution, it offers powerful transcription capabilities for those with technical skills. Here's a guide for those happy to dive into the technical bits - if you're not technical, have a look at RambleFix which is a fast, accurate off the shelf solution.
Prerequisites
- Python Environment: Ensure you have Python installed. You can download it from the official Python website.
- Install pip: Ensure you have pip, the Python package installer. It usually comes with Python, but you can install it by running
python -m ensurepip --upgrade
.
1. Install Whisper
- Open your terminal or command prompt.
- Install Whisper using pip:
pip install git+https://github.com/openai/whisper.git
2. Install Dependencies
Whisper relies on some other libraries. Install them by running:
pip install numpy torch
3. Download the Whisper Model
Whisper has different models (tiny, base, small, medium, large) with varying levels of accuracy and speed. Choose the model based on your requirements. For example, to use the "base" model, you can specify it during transcription.
4. Prepare Your Audio File
Ensure your audio file is in a format supported by Whisper (e.g., MP3, WAV, M4A). Place the audio file in an accessible directory.
5. Transcribe the Audio File
Use a Python script to transcribe the audio file. Here's a simple example:
- Open a text editor and create a new Python script, e.g.,
transcribe.py
. - Add the following code to the script:
import whisper # Load the Whisper model model = whisper.load_model("base") # Transcribe the audio file result = model.transcribe("path/to/your/audio/file.mp3") # Print the transcription print(result["text"])
- Replace
"path/to/your/audio/file.mp3"
with the actual path to your audio file. - Save and close the script.
6. Run the Transcription Script
- In the terminal or command prompt, navigate to the directory where you saved
transcribe.py
. - Run the script using Python:
python transcribe.py
7. View the Output
The transcription will be printed to the terminal. You can modify the script to save the output to a file if needed:
import whisper
# Load the Whisper model
model = whisper.load_model("base")
# Transcribe the audio file
result = model.transcribe("path/to/your/audio/file.mp3")
# Save the transcription to a text file
with open("transcription.txt", "w") as f:
f.write(result["text"])