Nau_Mar
Nau_Mar

Reputation: 1

How to get mel-spectagram peaks array in python?

I want to make an audio fingerprint, so i need to get a spectrogram peaks array. I've tried to find solution in the internet, but there's nothing.

Here is the spectagram example

import librosa, librosa.display
import numpy as np
import matplotlib.pyplot as plt
import IPython.display as ipd
from FFT import FFT

def MEL_SPECTOGRAM(signal, sr, fileName):
    ipd.Audio(signal, rate=sr)
    # this is the number of samples in a window per fft
    n_fft = 2048
    # The amount of samples we are shifting after each fft
    hop_length = 512

    audio_stft = librosa.core.stft(signal, hop_length=hop_length, n_fft=n_fft)
    spectrogram = np.abs(audio_stft)
    log_spectro = librosa.amplitude_to_db(spectrogram)

    librosa.util.normalize(log_spectro)

    librosa.display.specshow(log_spectro, sr=sr, n_fft=n_fft, hop_length=hop_length, cmap='magma', win_length=n_fft)

    plt.plot()
    plt.show()

[mel-spectagram example] (https://i.sstatic.net/u0zKd.png)

The best solution i found was this video, but unfortunately, it was written on wolfram, so i can't use it

https://www.youtube.com/watch?v=oCHeGesfJe8&ab_channel=Wolfram


Upvotes: 0

Views: 239

Answers (1)

Jon Nordby
Jon Nordby

Reputation: 6259

Peak finding in a 2d array is a common operation in computer-vision. So a good way to do this in Python is to lean on a computer vision like scipy.ndimage.

One of the best resources for explaining the landmark/constellation approach to audio fingerprinting (as used by Shazam et.c.) can be found in Fundamentals of Audio Processing: Chapter 7, Audio Identification notebook. It contains Python code for computing the constellation map, in the function compute_constellation_map.

Below is complete, runnable code based on the above resource. I have only made a few fixes for compatibility with modern librosa.

import librosa
import numpy as np
import scipy.ndimage

def compute_spectrogram(fn_wav, Fs=22050, N=2048, H=1024, bin_max=128, frame_max=None):
    x, Fs = librosa.load(fn_wav, sr=Fs)
    x_duration = len(x) / Fs
    X = librosa.stft(x, n_fft=N, hop_length=H, win_length=N, window='hamming')
    if bin_max is None:
        bin_max = X.shape[0]
    if frame_max is None:
        frame_max = X.shape[0]
    Y = np.abs(X[:bin_max, :frame_max])
    return Y

def compute_constellation_map(Y, dist_freq=7, dist_time=7, thresh=0.01):
    """Compute constellation map (implementation using image processing)    
    Args:
        Y (np.ndarray): Spectrogram (magnitude)
        dist_freq (int): Neighborhood parameter for frequency direction (kappa) (Default value = 7)
        dist_time (int): Neighborhood parameter for time direction (tau) (Default value = 7)
        thresh (float): Threshold parameter for minimal peak magnitude (Default value = 0.01)

    Returns:
        Cmap (np.ndarray): Boolean mask for peak structure (same size as Y)
    """
    result = scipy.ndimage.maximum_filter(Y, size=[2*dist_freq+1, 2*dist_time+1], mode='constant')
    Cmap = np.logical_and(Y == result, result > thresh)
    return Cmap

path = librosa.example('nutcracker')
spec = compute_spectrogram(path)
Cmap = compute_constellation_map(spec, dist_freq=7, dist_time=3)
print(Cmap.shape)

Here is also some plotting code to show the output. Again based on the above notebooks.


import matplotlib.pyplot as plt

def plot_constellation_map(Cmap, Y=None, xlim=None, ylim=None, title='',
                           xlabel='Time (sample)', ylabel='Frequency (bins)',
                           s=5, color='r', marker='o', figsize=(7, 3), dpi=72):
    if Cmap.ndim > 1:
        (K, N) = Cmap.shape
    else:
        K = Cmap.shape[0]
        N = 1
    if Y is None:
        Y = np.zeros((K, N))
    fig, ax = plt.subplots(1, 1, figsize=figsize, dpi=dpi)
    im = ax.imshow(Y, origin='lower', aspect='auto', cmap='gray_r', interpolation='nearest')
    ax.set_xlabel(xlabel)
    ax.set_ylabel(ylabel)
    ax.set_title(title)
    Fs = 1
    if xlim is None:
        xlim = [-0.5/Fs, (N-0.5)/Fs]
    if ylim is None:
        ylim = [-0.5/Fs, (K-0.5)/Fs]
    ax.set_xlim(xlim)
    ax.set_ylim(ylim)
    n, k = np.argwhere(Cmap == 1).T
    ax.scatter(k, n, color=color, s=s, marker=marker)
    plt.tight_layout()
    return fig, ax, im

fig, ax, im = plot_constellation_map(Cmap, np.log(1 + 1 * spec), color='r', s=20, figsize=(15, 5))
fig.savefig('constellation-map.png')

Running it should give an image such as this:

enter image description here

Upvotes: 0

Related Questions