Different spectrogram between audio_ops and tf.contrib.signal

Question

I am trying to update the feature extraction pipeline of an speech command recognition model replacing the function audio_ops.audio_spectrogram() by tf.contrib.signal.stft(). I assumed that they were equivalent, but I am obtaining different spectrogram values with the same input audio. Could someone explain the relation between the two methods, or whether it is possible to obtain the same results using tf.contrib.signal.stft()?

My code:

1) audio_ops method:

from tensorflow.contrib.framework.python.ops import audio_ops
import tensorflow as tf
import numpy as np
from tensorflow.python.ops import io_ops

#WAV audio loader
wav_filename_placeholder_ = tf.placeholder(tf.string, [], name='wav_filename')
wav_loader = io_ops.read_file(wav_filename_placeholder_)
sample_rate = 16000
desired_samples = 16000 #1 sec audio
wav_decoder = audio_ops.decode_wav(wav_loader, desired_channels=1, desired_samples=desired_samples)

#Computing the spectrograms
spectrogram = audio_ops.audio_spectrogram(wav_decoder.audio,
                                              window_size=320,
                                              stride=160,
                                              magnitude_squared=False)
with tf.Session() as sess:
    feed_dict={wav_filename_placeholder_:"//audio_sample.wav"}
    #Get the input audio and the spectrogram
    audio_ops_wav_decoder_audio, audio_ops_spectrogram = sess.run([wav_decoder.audio, spectrogram], feed_dict)

2) tf.contrib.signal method:

#Input WAV audio (will be initialized with the same audio signal: wav_decoder.audio )
signals = tf.placeholder(tf.float32, [None, None])

#Compute the spectrograms and get the absolute values
stfts = tf.contrib.signal.stft(signals, 
                               frame_length=320, 
                               frame_step=160, 
                               fft_length=512, 
                               window_fn=None)
magnitude_spectrograms = tf.abs(stfts)
with tf.Session() as sess:
    feed_dict = {signals : audio_ops_wav_decoder_audio.reshape(1,16000)}
    tf_original, tf_stfts, tf_spectrogram, = sess.run([signals, stfts, magnitude_spectrograms], feed_dict)

Thank you in advance

Different spectrogram between audio_ops and tf.contrib.signal

Answers (1)

Related Questions