mahnoor.fatima
mahnoor.fatima

Reputation: 1

Spectrogram PNG back to WAV Audio

I'm working on a peculiar task that I can't seem to accomplish. I have a bunch of audios that I convert into spectrogram pngs (time-frequency plots that I save using plt). I am training a GAN to produce similar spectrograms and want to convert those generated spectrograms back into an audio .wav file.

Here is what I am using to get the spectrograms:

y, sr = librosa.load(wav_file, sr=rate, duration=1)
stft = librosa.stft(y, n_fft=n_fft, hop_length=hop_length, win_length=win_length)
mag_spectrogram, phase_spectrogram = librosa.magphase(stft)
plt.figure(figsize=(10, 4))
librosa.display.specshow(librosa.amplitude_to_db(np.abs(mag_spectrogram), ref=np.max), sr=sr,   hop_length=hop_length, x_axis='time', y_axis='linear')

plt.gca().set_title('')
plt.colorbar().remove()
plt.axis('off')
plt.savefig("saved_mag_spec.png", bbox_inches='tight', pad_inches=0)

How can I load spectrograms of similar plt figures such that I can reconstruct the audio from those spectrograms alone?

I do understand that using this plot as the spectrogram is not exactly correct representation in order to restore the audio in the end. I believe that I can use the istft or even restore the audio if I keep track of the phase information for each of my spectrograms.

The issue is that the magnitude spectrogram or stft output (and the phase info) is essentially in the form (freq_bins, frames). When I convert this into the png spectrogram using librosa.specshow, I change the size and shape of the spectrogram into a time, frequency distribution. How can I load this image back and keep track of the phase information so that I can use this spectrogram information to produce audio.

Upvotes: 0

Views: 152

Answers (1)

dankal444
dankal444

Reputation: 4158

It seems like you made your life more difficult than it could be. Why convert GAN result to PNG file in the first place?

You should make GAN produce result in form of stft or spectrograms, "digestible" by istft (you need to convert spectrograms to stft using np.exp). You could generate 2-channel spectrograms, for real and imaginary part of the stft. Given good amount of data GAN should learn how to produce "correct" phase.

Another idea is yet another neural network model that learns how to "restore" phase of the spectrograms. Note: you can't really restore the original phase, you lost it forever with abs operateion. But you can generate something that sounds reasonable.

Upvotes: 0

Related Questions