Shamoon
Shamoon

Reputation: 43491

How can I convert spectrogram data to a tensor (or multidimensional numpy array)?

I am using keras and have:

        corrupted_samples, corrupted_sample_rate = sf.read(
            self.corrupted_audio_file_paths[index])

        frequencies, times, spectrogram = scipy.signal.spectrogram(
            corrupted_samples, corrupted_sample_rate)

As per the docs, this gives:

f (ndarray) - Array of sample frequencies.
t (ndarray) - Array of segment times.
Sxx (ndarray) - Spectrogram of x. By default, the last axis of Sxx corresponds to the segment times.

I assume all of the times will line up, so I don't care about the value of the time (I don't think). The same is true of frequencies. So what I actually need is the values at each time for each frequency, which is given by Sxx (or spectrogram) in my code. I'm unsure how to actually do that. It seems simple though.

Upvotes: 3

Views: 2073

Answers (1)

wz 98
wz 98

Reputation: 93

Based on https://towardsdatascience.com/speech-recognition-analysis-f03ff9ce78e9, the author stated that the spectrogram is a spectro-temporal representation of the sound and show some of the steps of converting wav file to spectogram.

One of the example could be as below:

## Check the sampling rate of the WAV file.
audio_file = './siren_mfcc_demo.wav'


import wave
with wave.open(audio_file, "rb") as wave_file:
    sr = wave_file.getframerate()
print(sr)

audio_binary = tf.read_file(audio_file)

# tf.contrib.ffmpeg not supported on Windows, refer to issue
# https://github.com/tensorflow/tensorflow/issues/8271
waveform = tf.contrib.ffmpeg.decode_audio(audio_binary, file_format='wav', samples_per_second=sr, channel_count=1)
print(waveform.numpy().shape)

signals = tf.reshape(waveform, [1, -1])
signals.get_shape()

# Compute a [batch_size, ?, 128] tensor of fixed length, overlapping windows
# where each window overlaps the previous by 75% (frame_length - frame_step
# samples of overlap).
frames = tf.contrib.signal.frame(signals, frame_length=128, frame_step=32)
print(frames.numpy().shape)

# `magnitude_spectrograms` is a [batch_size, ?, 129] tensor of spectrograms. We
# would like to produce overlapping fixed-size spectrogram patches; for example,
# for use in a situation where a fixed size input is needed.
magnitude_spectrograms = tf.abs(tf.contrib.signal.stft(
    signals, frame_length=256, frame_step=64, fft_length=256))

print(magnitude_spectrograms.numpy().shape)

The method above is referring to https://colab.research.google.com/drive/1Adcy25HYC4c9uSBDK9q5_glR246m-TSx#scrollTo=QTa1BVSOw1Oe

Hope it can help you.

Upvotes: 2

Related Questions