Reputation: 1
I want to make an audio fingerprint, so i need to get a spectrogram peaks array. I've tried to find solution in the internet, but there's nothing.
Here is the spectagram example
import librosa, librosa.display
import numpy as np
import matplotlib.pyplot as plt
import IPython.display as ipd
from FFT import FFT
def MEL_SPECTOGRAM(signal, sr, fileName):
ipd.Audio(signal, rate=sr)
# this is the number of samples in a window per fft
n_fft = 2048
# The amount of samples we are shifting after each fft
hop_length = 512
audio_stft = librosa.core.stft(signal, hop_length=hop_length, n_fft=n_fft)
spectrogram = np.abs(audio_stft)
log_spectro = librosa.amplitude_to_db(spectrogram)
librosa.util.normalize(log_spectro)
librosa.display.specshow(log_spectro, sr=sr, n_fft=n_fft, hop_length=hop_length, cmap='magma', win_length=n_fft)
plt.plot()
plt.show()
[mel-spectagram example] (https://i.sstatic.net/u0zKd.png)
https://www.youtube.com/watch?v=oCHeGesfJe8&ab_channel=Wolfram
Upvotes: 0
Views: 239
Reputation: 6259
Peak finding in a 2d array is a common operation in computer-vision. So a good way to do this in Python is to lean on a computer vision like scipy.ndimage
.
One of the best resources for explaining the landmark/constellation approach to audio fingerprinting (as used by Shazam et.c.) can be found in Fundamentals of Audio Processing: Chapter 7, Audio Identification notebook. It contains Python code for computing the constellation map, in the function compute_constellation_map
.
Below is complete, runnable code based on the above resource. I have only made a few fixes for compatibility with modern librosa.
import librosa
import numpy as np
import scipy.ndimage
def compute_spectrogram(fn_wav, Fs=22050, N=2048, H=1024, bin_max=128, frame_max=None):
x, Fs = librosa.load(fn_wav, sr=Fs)
x_duration = len(x) / Fs
X = librosa.stft(x, n_fft=N, hop_length=H, win_length=N, window='hamming')
if bin_max is None:
bin_max = X.shape[0]
if frame_max is None:
frame_max = X.shape[0]
Y = np.abs(X[:bin_max, :frame_max])
return Y
def compute_constellation_map(Y, dist_freq=7, dist_time=7, thresh=0.01):
"""Compute constellation map (implementation using image processing)
Args:
Y (np.ndarray): Spectrogram (magnitude)
dist_freq (int): Neighborhood parameter for frequency direction (kappa) (Default value = 7)
dist_time (int): Neighborhood parameter for time direction (tau) (Default value = 7)
thresh (float): Threshold parameter for minimal peak magnitude (Default value = 0.01)
Returns:
Cmap (np.ndarray): Boolean mask for peak structure (same size as Y)
"""
result = scipy.ndimage.maximum_filter(Y, size=[2*dist_freq+1, 2*dist_time+1], mode='constant')
Cmap = np.logical_and(Y == result, result > thresh)
return Cmap
path = librosa.example('nutcracker')
spec = compute_spectrogram(path)
Cmap = compute_constellation_map(spec, dist_freq=7, dist_time=3)
print(Cmap.shape)
Here is also some plotting code to show the output. Again based on the above notebooks.
import matplotlib.pyplot as plt
def plot_constellation_map(Cmap, Y=None, xlim=None, ylim=None, title='',
xlabel='Time (sample)', ylabel='Frequency (bins)',
s=5, color='r', marker='o', figsize=(7, 3), dpi=72):
if Cmap.ndim > 1:
(K, N) = Cmap.shape
else:
K = Cmap.shape[0]
N = 1
if Y is None:
Y = np.zeros((K, N))
fig, ax = plt.subplots(1, 1, figsize=figsize, dpi=dpi)
im = ax.imshow(Y, origin='lower', aspect='auto', cmap='gray_r', interpolation='nearest')
ax.set_xlabel(xlabel)
ax.set_ylabel(ylabel)
ax.set_title(title)
Fs = 1
if xlim is None:
xlim = [-0.5/Fs, (N-0.5)/Fs]
if ylim is None:
ylim = [-0.5/Fs, (K-0.5)/Fs]
ax.set_xlim(xlim)
ax.set_ylim(ylim)
n, k = np.argwhere(Cmap == 1).T
ax.scatter(k, n, color=color, s=s, marker=marker)
plt.tight_layout()
return fig, ax, im
fig, ax, im = plot_constellation_map(Cmap, np.log(1 + 1 * spec), color='r', s=20, figsize=(15, 5))
fig.savefig('constellation-map.png')
Running it should give an image such as this:
Upvotes: 0