chaima rebah
chaima rebah

Reputation: 31

How do I can reconstructing stft to audio?

In order to train an autoencoder model using audio data as a first step, I need to understand the different representations of audio found in the literature, such as STFT(not spectrogram i mean by stft "the stft coefficients") , spectrogram, MFCC, etc. For this reason, I want to convert a .wav file into STFT (not a spectrogram) or any other representation, and then reverse that representation to reconstruct it back into a .wav file. I will listen to the reconstructed audio to assess if it is too degraded or not, and to determine if the chosen parameters for the representation are appropriate before use it with the model. I am currently attempting to implement the code, but I need some help on how to proceed with the reconstruction phase. how I can implement this in python. Below is the code I have attempted to implement, but without the reconstruction phase.


import os
import librosa
import librosa.display
import IPython.display as ipd
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf 
scale_file = "/content/0-m-21-0-1-105.wav"
y,sr = librosa.load(scale_file, sr=16000)
###### waveform #######
plt.figure(figsize=(12,5))
plt.xlabel("Time (s)")
plt.ylabel("Amplitude")
plt.title("Waveform")
plt.plot(y)
plt.show()
##### STFT #######
n_fft=1024 # window_length
hop_length=512
window_type ='hann'
sample_rate = 16000
# calculate duration hop length and window in seconds
hop_length_duration = float(hop_length)/sample_rate
n_fft_duration = float(n_fft)/sample_rate
print(f'“STFT hop length duration is:{hop_length_duration}s ”')
print(f'“STFT window duration is: {n_fft_duration}s ”.')

stft_lib = librosa.stft(y, n_fft=n_fft, 
                               hop_length=hop_length, 
                               win_length=n_fft,
                               window=window_type)

### spectrogram ####
spectrogram = np.abs(stft_lib)
#X_inv = librosa.griffinlim(np.abs(stft_lib))
log_spectrogram = librosa.amplitude_to_db(spectrogram)
librosa.display.specshow(log_spectrogram, sr=sample_rate,hop_length=hop_length)
plt.xlabel("Time")
plt.ylabel("Frequency")
plt.colorbar(format="%+2.0f dB")
plt.title("Spectrogram (dB)")
plt.savefig("spectogram_log.png")
plt.show()

##### mel spectrogram ####

mel_spect = librosa.feature.melspectrogram(y=y, 
                                           sr = sr, 
                                           n_fft=n_fft,
                                           n_mels = 20,
                                           hop_length=hop_length,
                                           window='hann')
log_mel_spect = librosa.amplitude_to_db(mel_spect)
librosa.display.specshow(log_mel_spect, sr=sample_rate,x_axis ='time', y_axis='mel',hop_length=hop_length)

plt.colorbar(format="%+2.0f dB")
plt.title("Log Mel spectrogram")
plt.tight_layout()
plt.savefig("Log Mel spectrogram.png")
plt.show()


I am attempting to train an autoencoder using audio data for a steganography application, and I am currently adjusting the parameters of the data representation before feeding it into the model. To achieve this, I am trying to convert the audio into a representation, then reconstruct it and listen to it. This approach helps me avoid the impact of choosing an inappropriate representation on the training outcome.

Upvotes: 1

Views: 540

Answers (1)

chaima rebah
chaima rebah

Reputation: 31

If someone want to do the same .wav -> spectrogram then reconstruct it later, you can find the explanation below

my_sample_rate = 16000
# step1 - converting a wav file to numpy array and then converting that to mel-spectrogram
my_audio_as_np_array, my_sample_rate= librosa.load("path_file")

# step2 - converting audio np array to spectrogram
spec = librosa.feature.melspectrogram(y=my_audio_as_np_array,
                                        sr=my_sample_rate,
                                            n_fft=1024,
                                            hop_length=512,
                                            win_length=None,
                                            window='hann',
                                            center=True,
                                            pad_mode='reflect',
                                            power=2.0)
                                            #n_mels=128)

# step3 converting mel-spectrogrma back to wav file

res = librosa.feature.inverse.mel_to_audio(spec,
                                           sr=my_sample_rate,
                                           n_fft=1024,
                                           hop_length=512,
                                           win_length=None,
                                           window='hann',
                                           center=True,
                                           pad_mode='reflect',
                                           power=2.0,
                                           n_iter=32)
                                           #n_mels=128)

# step4 - save it as a wav file
import soundfile as sf
sf.write("test2.wav", res, my_sample_rate)

or you can use griffinlim algorithm to convert the mel-spectrogram back to wav files

res = librosa.feature.inverse.mel_to_stft(spec,
                                           sr=my_sample_rate,
                                           n_fft=1048)



y = librosa.griffinlim(res,
                       n_fft=1048,
                      hop_length=512,
                      win_length=None,
                      window='hann',
                      center=True,
                      pad_mode='reflect')

Upvotes: 1

Related Questions