Reputation: 127
import matplotlib.pyplot as plt
from matplotlib.pyplot import specgram
import librosa
import librosa.display
import numpy as np
import io
from PIL import Image
samples, sample_rate = librosa.load('thabo.wav')
fig = plt.figure(figsize=[4, 4])
ax = fig.add_subplot(111)
ax.axes.get_xaxis().set_visible(False)
ax.axes.get_yaxis().set_visible(False)
ax.set_frame_on(False)
S = librosa.feature.melspectrogram(y=samples, sr=sample_rate)
librosa.display.specshow(librosa.power_to_db(S, ref=np.max))
buf = io.BytesIO()
plt.savefig(buf, bbox_inches='tight',pad_inches=0)
# plt.close('all')
buf.seek(0)
im = Image.open(buf)
# im = Image.open(buf).convert('L')
im.show()
buf.close()
ffmpeg -i thabo.wav -lavfi showspectrumpic=s=224x224:mode=separate:legend=disabled spectrogram.png
Please help, i want a spectrogram that is exactly the same as the one produced by FFMPEG, for use with a speech recognition model exported from google's teachable machine. Offline recognition
Upvotes: 1
Views: 2326
Reputation: 133723
You can directly pipe the audio to ffmpeg
which will avoid the intermediate file, and ffmpeg
can output to pipe as well if you wanted to avoid image file output.
Demonstration using three instances of ffmpeg
:
ffmpeg -i input.wav -f wav - | ffmpeg -i - -filter_complex "showspectrumpic=s=224x224:mode=separate:legend=disabled" -c:v png -f image2pipe - | ffmpeg -y -i - output.png
The first and last ffmpeg
instances of course will be replaced with your particular processes for your workflow.
Upvotes: 1