Formatting mp3 files to spectrograms for CNN machine learning

Question

I've had a lot of fun and have learned a lot from playing with the MNIST dataset but I want to branch out into working with my own training data, I'm using theano with keras. However I'm having an extremely difficult time conceptualizing how to create training data.

I looked over the structure of MNIST and saw that, as data, the first piece of it is the classification of what the image is, and the rest of it is the individual pixel brightness from 0 - 100.

My first thoughts are to use specotgrams like this or this but what I'm not understanding is how to structure the data so that a CNN could read it. Any ideas or suggestions?

Aditya · Accepted Answer

I may not be able to answer this question accurately.

I will try to explain the format of the MNIST dataset for which I will take the following example.

Consider that you wish to classify images with four distinct class labels in your training data -> car, bike, bus and airplane. All the images are colored images with 255 pixel values for each of RGB. I'm excluding the luminosity / opaqueness value as an attribute for this example. All the images have been normalized and are 28x28 (just an arbitrary dimension) in dimension. This gives us 784 pixels and since each pixel has 3 values for RGB, this gives us 784x3 = 2352 attribute values.

The output for each instance (in the MNIST dataset) is expressed as a one hot vector representation. The 1 hot vector representations for the car, bike, bus and airplane being 1000, 0100, 0010 and 0001 respectively.

Suppose that the MNIST dataset has 1000 instances for training, then it will take the following structure. It will consist of 1000 tuples where each tuple is a combination of an input vector (which is 2352 attributes long) and an output vector (one hot representation which is 4 attributes long).

This is for added clarity.

([12, 51, 16, 17, ......., 12], [0, 0, 1, 0])

([55, 125, 71, 244, ....., 10], [1, 0, 0, 0])

......

Where the first list in the tuple is 2352 in length and the second list in the tuple is 4 in length. There would be a total of 1000 tuples each representing a training instance.

If you wish, you can take a look at this code wherein I have created a dataset similar to the MNIST format.

Formatting mp3 files to spectrograms for CNN machine learning

Answers (1)

Related Questions