Reputation: 83
I am trying to turn mp3 file to text, but my code returns the error outlined below. Any help is appreciated!
This is a sample mp3 file. And below is what I have tried:
import speech_recognition as sr
print(sr.__version__)
r = sr.Recognizer()
file_audio = sr.AudioFile(r"C:\Users\Andrew\Podcast.mp3")
with file_audio as source:
audio_text = r.record(source)
print(type(audio_text))
print(r.recognize_google(audio_text))
The full error I get. Appears to be:
Error: file does not start with RIFF id
Thank you for your help!
Upvotes: 6
Views: 9938
Reputation: 1
# Import the required libraries
import speech_recognition as sr # Library for speech recognition
import os # Library for interacting with the operating system
from pydub import AudioSegment # Library for working with audio files
from pydub.silence import split_on_silence # Function for splitting audio files based on silence
# Create a speech recognition object
recognizer = sr.Recognizer()
def transcribe_large_audio_file(audio_path):
"""
Split audio into chunks and apply speech recognition
"""
# Load audio file with pydub
audio = AudioSegment.from_mp3(audio_path)
# Split audio at silent parts with duration of 700ms or more and obtain chunks
audio_chunks = split_on_silence(audio, min_silence_len=700, silence_thresh=audio.dBFS-14, keep_silence=700)
# Create a directory to store audio chunks
chunks_dir = "audio-chunks"
if not os.path.isdir(chunks_dir):
os.mkdir(chunks_dir)
full_text = ""
failed_attempts = 0
# Process each audio chunk
for i, chunk in enumerate(audio_chunks, start=1):
# Save chunk in the directory
chunk_file_name = os.path.join(chunks_dir, f"chunk{i}.wav")
chunk.export(chunk_file_name, format="wav")
# Recognize audio from the chunk
with sr.WavFile(chunk_file_name) as src:
listened_audio = recognizer.listen(src)
# Convert audio to text
try:
text = recognizer.recognize(listened_audio)
except:
failed_attempts += 1
if failed_attempts == 5:
print(f"Skipping {audio_path} due to too many errors")
break
else:
failed_attempts = 0
text = f"{text.capitalize()}. "
print(chunk_file_name, ":", text)
full_text += text
# Return the transcription for all chunks
return full_text
# Define the output directory
output_dir = "C:\\Store\\output"
# Create the output directory if it does not exist
os.makedirs(output_dir, exist_ok=True)
# Create a list of processed files
processed_files = []
# Iterate through all .mp3 files in the directory and transcribe them
with open(os.path.join(output_dir, 'result.txt'), 'w') as result_file:
for file in os.listdir(output_dir):
# Process only .mp3 files that have not been processed before
if file.endswith(".mp3") and file not in processed_files:
mp3_file_path = os.path.join(output_dir, file)
print(f"Processing {mp3_file_path}")
try:
# Transcribe the audio file
transcription = transcribe_large_audio_file(mp3_file_path)
except LookupError as error:
# If there is an error, skip the file and continue with the next one
print(f"Skipping {mp3_file_path} due to error: {error}")
continue
else:
# Save the transcription to a text file with the same name as the audio file
txt_file_path = os.path.join(output_dir, f"{os.path.splitext(file)[0]}.txt")
with open(txt_file_path, 'w') as txt_file:
txt_file.write(transcription)
# Print the transcription and the path to the saved text file
print(transcription)
print(f"Transcription saved to {txt_file_path}")
# Save the transcription to the result
Upvotes: -1
Reputation: 301
You need to first convert the mp3 to wav, and then you can transcribe it, below is the modified version of your code.
import speech_recognition as sr
from pydub import AudioSegment
# convert mp3 file to wav
src=(r"C:\Users\Andrew\Podcast.mp3")
sound = AudioSegment.from_mp3(src)
sound.export("C:\Users\Andrew\podcast.wav", format="wav")
file_audio = sr.AudioFile(r"C:\Users\Andrew\Podcast.wav")
# use the audio file as the audio source
r = sr.Recognizer()
with file_audio as source:
audio_text = r.record(source)
print(type(audio_text))
print(r.recognize_google(audio_text))
In above modified code, first mp3 file being converted into wav and then transcribing processes.
Upvotes: 5
Reputation: 42
One thing you can do is to convert your mp3 to wav. When testing with an mp3 file I've got the same error as you. But after converting, your code runs fine. Might be possible to also write your code so you can use mp3s but there my knowledge ends.
Maybe someones else knows more than me than he could post it. But if you just wan't to test you can use something like audacity to convert it for now.
Also you might have problems if you go with large files read something online about that. But theres nothing stopping you trying.
Here is the website for that:
https://www.geeksforgeeks.org/python-speech-recognition-on-large-audio-files/
Upvotes: 1