Mel Spectrogram feature extraction to CNN

Question

This question is in line with the question posted here but with a slight nuance of the CNN. Using the feature extraction definition:

max_pad_len = 174
n_mels = 128

def extract_features(file_name):
    try:
        audio, sample_rate = librosa.core.load(file_name, res_type='kaiser_fast')
        mely = librosa.feature.melspectrogram(y=audio, sr=sample_rate, n_mels=n_mels)
        #pad_width = max_pad_len - mely.shape[1]
        #mely = np.pad(mely, pad_width=((0, 0), (0, pad_width)), mode='constant')

    except Exception as e:
        print("Error encountered while parsing file: ", file_name)
        return None

    return mely

How do you go about getting the correct dimension of the num_rows, num_columns and num_channels to be input to the train and test data?

In constructing the CNN Model, how to determine the correct shape to input?

model = Sequential()
model.add(Conv2D(filters=16, kernel_size=2, input_shape=(num_rows, num_columns, num_channels), activation='relu'))
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.2))

Mel Spectrogram feature extraction to CNN

Answers (1)

Related Questions