Reputation: 1741
I am working on clustering applied to acoustic data.
Among options to normalize Log-Mel spectrograms, I was experimenting with normalizations applied along the bins axis, not on the whole spectrograms.
Can you help understand the effects of normalization along the frequency bins, and if it has known applications in signal processing or bioacoustics ?
Example:
Suppose that I have a tensor of shape (batch, 1, n_time, n_bins)
, I would get min and max like this:
# Normalize along the frequency band
min_values = tf.math.reduce_min(log_mel_spectrograms, axis=3, keepdims=True)
max_values = tf.math.reduce_max(log_mel_spectrograms, axis=3, keepdims=True)
Instead of :
# Normalize along the spectrogram
min_values = tf.math.reduce_min(log_mel_spectrograms)
max_values = tf.math.reduce_max(log_mel_spectrograms)
and finally norm_specs = 2.0 * (specs - min_values) / (max_values - min_values) - 1.0
Results are quite different, and I want to understand if I am introducing artifacts, or if it is a legit type of normalizations. I could not find scientific literature, but some interpretation that it would be legit to 'decorrelate' the frequency bands, and render the signal in those bands where the energy is lower.
I would like to better understand the effect of normalization applied along the frequency bins, and possibly some reference to literature describing applications.
To describe the effect, see the pictures below: the first is with normalization along the whole spec, the secdond along the frequency axis. While it may extremise values along the edges of the signal, it also enhance the signal and in particular the signals in the lower bins. However, I also see vertical lines along the time axis, which I don't know how to interpret.
Upvotes: 0
Views: 99