Transcribing mp3 to text (python) --> "RIFF id" error

I am trying to turn mp3 file to text, but my code returns the error outlined below. Any help is appreciated!

This is a sample mp3 file. And below is what I have tried:

import speech_recognition as sr
print(sr.__version__)
r = sr.Recognizer()

file_audio = sr.AudioFile(r"C:\Users\Andrew\Podcast.mp3")

with file_audio as source:
    audio_text = r.record(source)

print(type(audio_text))
print(r.recognize_google(audio_text))

The full error I get. Appears to be:

Error: file does not start with RIFF id

Thank you for your help!

Upvotes: 6

Answers (3)

user3888329

Reputation: 1

# Import the required libraries
import speech_recognition as sr  # Library for speech recognition
import os  # Library for interacting with the operating system
from pydub import AudioSegment  # Library for working with audio files
from pydub.silence import split_on_silence  # Function for splitting audio files based on silence

# Create a speech recognition object
recognizer = sr.Recognizer()

def transcribe_large_audio_file(audio_path):
    """
    Split audio into chunks and apply speech recognition
    """
    # Load audio file with pydub
    audio = AudioSegment.from_mp3(audio_path)
    # Split audio at silent parts with duration of 700ms or more and obtain chunks
    audio_chunks = split_on_silence(audio, min_silence_len=700, silence_thresh=audio.dBFS-14, keep_silence=700)

    # Create a directory to store audio chunks
    chunks_dir = "audio-chunks"
    if not os.path.isdir(chunks_dir):
        os.mkdir(chunks_dir)

    full_text = ""
    failed_attempts = 0
    # Process each audio chunk
    for i, chunk in enumerate(audio_chunks, start=1):
        # Save chunk in the directory
        chunk_file_name = os.path.join(chunks_dir, f"chunk{i}.wav")
        chunk.export(chunk_file_name, format="wav")
        # Recognize audio from the chunk
        with sr.WavFile(chunk_file_name) as src:
            listened_audio = recognizer.listen(src)
            # Convert audio to text
            try:
                text = recognizer.recognize(listened_audio)
            except:
                failed_attempts += 1
                if failed_attempts == 5:
                    print(f"Skipping {audio_path} due to too many errors")
                    break
            else:
                failed_attempts = 0
                text = f"{text.capitalize()}. "
                print(chunk_file_name, ":", text)
                full_text += text
    # Return the transcription for all chunks
    return full_text

# Define the output directory
output_dir = "C:\\Store\\output"

# Create the output directory if it does not exist
os.makedirs(output_dir, exist_ok=True)

# Create a list of processed files
processed_files = []

# Iterate through all .mp3 files in the directory and transcribe them
with open(os.path.join(output_dir, 'result.txt'), 'w') as result_file:
    for file in os.listdir(output_dir):
        # Process only .mp3 files that have not been processed before
        if file.endswith(".mp3") and file not in processed_files:
            mp3_file_path = os.path.join(output_dir, file)
            print(f"Processing {mp3_file_path}")
            try:
                # Transcribe the audio file
                transcription = transcribe_large_audio_file(mp3_file_path)
            except LookupError as error:
                # If there is an error, skip the file and continue with the next one
                print(f"Skipping {mp3_file_path} due to error: {error}")
                continue
            else:
                # Save the transcription to a text file with the same name as the audio file
                txt_file_path = os.path.join(output_dir, f"{os.path.splitext(file)[0]}.txt")
                with open(txt_file_path, 'w') as txt_file:
                    txt_file.write(transcription)
                # Print the transcription and the path to the saved text file
                print(transcription)
                print(f"Transcription saved to {txt_file_path}")
                # Save the transcription to the result

Upvotes: -1

Meghshyam Sonar

Reputation: 301

You need to first convert the mp3 to wav, and then you can transcribe it, below is the modified version of your code.

import speech_recognition as sr
from pydub import AudioSegment

# convert mp3 file to wav  
src=(r"C:\Users\Andrew\Podcast.mp3")
sound = AudioSegment.from_mp3(src)
sound.export("C:\Users\Andrew\podcast.wav", format="wav")

file_audio = sr.AudioFile(r"C:\Users\Andrew\Podcast.wav")

# use the audio file as the audio source                                        
r = sr.Recognizer()
with file_audio as source:
audio_text = r.record(source)

print(type(audio_text))
print(r.recognize_google(audio_text))

In above modified code, first mp3 file being converted into wav and then transcribing processes.

Upvotes: 5

Andreas B

Reputation: 42

One thing you can do is to convert your mp3 to wav. When testing with an mp3 file I've got the same error as you. But after converting, your code runs fine. Might be possible to also write your code so you can use mp3s but there my knowledge ends.

Maybe someones else knows more than me than he could post it. But if you just wan't to test you can use something like audacity to convert it for now.

Also you might have problems if you go with large files read something online about that. But theres nothing stopping you trying.

Here is the website for that:

https://www.geeksforgeeks.org/python-speech-recognition-on-large-audio-files/

Upvotes: 1

Transcribing mp3 to text (python) --&gt; &quot;RIFF id&quot; error

Answers (3)

Related Questions

Transcribing mp3 to text (python) --> "RIFF id" error