John
John

Reputation: 47

Increase val_acc in the audio classification

I have 530 data points belonging to 10 classes. I am not sure which numbers should I use for the num_rows and num_columns.

In this code I have num_rows = 40, num_columns = 174:

model = Sequential()
model.add(Conv2D(filters=32, kernel_size=2, input_shape=(num_rows, num_columns, num_channels), activation='relu'))
model.add(MaxPooling2D(pool_size=2))
#model.add(Dropout(0.2))

model.add(Conv2D(filters=64, kernel_size=2, kernel_regularizer=l2(0.00001), bias_regularizer=l2(0.0001), activation='relu'))
model.add(MaxPooling2D(pool_size=2))
#model.add(Dropout(0.2))

model.add(Conv2D(filters=128, kernel_size=2, kernel_regularizer=l2(0.00001), bias_regularizer=l2(0.0001), activation='relu'))
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.2))

model.add(Conv2D(filters=128, kernel_size=2, kernel_regularizer=l2(0.00001), bias_regularizer=l2(0.0001),  activation='relu'))
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.2))
#model.add(GlobalAveragePooling2D())

model.add(Flatten())
model.add(Dense(512, activation='relu'))
#model.add(Dropout(0.2))
model.add(Dense(256, activation='relu'))
#model.add(Dropout(0.2))
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))

model.add(Dense(10, activation='softmax'))
# Compile the model
#opt = keras.optimizers.Adam(learning_rate=0.001)
model.compile(loss='categorical_crossentropy', metrics=\['accuracy'\], optimizer="Adam")

model loss

Upvotes: 2

Views: 63

Answers (1)

Lukasz Tracewski
Lukasz Tracewski

Reputation: 11377

I am guessing you have some sort of spectrograms on your input (since you're working with audio, but have 3-dimensional shape on input). Your input_shape has to reflect the size of images that you pass on input. Simply check their width and height - these are your num_rows and num_columns.

According to that code, the images have 3 colour bands. That makes sense for photos, but rarely for spectrograms. Remember these are false colours that typically are generated to create visually-pleasing visualisations, but don't get you anything when doing classification. Single channel is enough, the pixel intensity reflects strength (amplitude) of the signal.

Three simple things you can do:

  • Use monochromatic images, e.g. input_shape=(num_rows, num_columns, 1). Colour only confuses the classifier.
  • Get more data and use augmentation.
  • kernel_size=2 makes little sense. Read on convolutions first and what are the kernels.

Upvotes: 3

Related Questions