Reputation: 23
Hey I’m total Layman in case od audio processing so my question will be very basic. I have audio from 2 groups X and Y with .wav audio samples and I need to make model which will correctly classify is the sound X or Y. I founded how to load data into list, than I converted it to Dataframe I have 2 columns(in second one there is 8000 elements in each row).
0 1
0 2000 [0.1329449, 0.14544961, 0.19810106, 0.21718721...
1 2000 [-0.30273795, -0.6065889, -0.4967722, -0.47117...
2 2000 [-0.07037315, -0.6685449, -0.48479277, -0.4535...
I founded those useful features from python_speech_features module so far:
rate,signal = sw.read(i)
features = psf.base.mfcc(signal)
features = psf.base.fbank(features)
features = psf.base.logfbank(features[1])
features = psf.base.lifter(features,L=22)
features = psf.base.delta(features,N=13)
features = pd.DataFrame(features)
I will appreciate all kind of help Additional resources for self-learning will be highly welcome as well.
Upvotes: 1
Views: 1714
Reputation: 2533
I've had great success in converting audio files to melspectrograms and using a basic CNN to classify the images. The following function requires the librosa
library:
def audio_to_image(path, height=192, width=192):
signal, sr = lr.load(path, res_type='kaiser_fast')
hl = signal.shape[0]//(width*1.1)
spec = lr.feature.melspectrogram(signal, n_mels=height, hop_length=int(hl))
img = lr.power_to_db(spec)**2
start = (img.shape[1] - width) // 2
return img[:, start:start+width]
The result will look something like this:
While there is little human intuition behind these images, CNNs can classify them fairly well. Play a little with different resolutions and settings. Let me know how this works for you.
EDIT: Here is the full code of my own project classifying audio samples of speech to their spoken language.
Upvotes: 2