nyana ndev
nyana ndev

Reputation: 1

Understanding the concept of filters in conv nets for computer vision

I am trying to understand the concept of filters in conv nets for computer vision . I understand what they do for instance they can be used to reduce dimensionality of the input image and so on.... what I am stuck at is where do these filters come from?

For example, I was watching a tutorial which show that to detect a vertical line /edge we can use a 3x3 filter of the shape [ [-1, 0 , 1], [-1, 0 , 1], [-1, 0 , 1] ] ... how did we come up with this matrix? Even using Keras, I only had to pass the number of filters that I want to use

model.add(Conv2D(64, (3, 3), padding='same', input_shape=x_train.shape[1:]))

where 64 is the number of filters I want to apply to the input ... but how does Keras or any other library decide what numbers the filter matrix will hold ? I am confused.

Say you have the input space of 1000 images each 36x36x3 where 3 is the channel (one for R, G and B) ... this means we have 3 matrix representing each image... in total it will be 1000 x 3 = 3000 matrices.

Now if I want to detect edges all over the image, by edge i mean outlines of objects to detect if the image is a laptop or a phone, how does this really happen within the conv net ? is the concept of finding edges is just abstract and all what finding edges means is similar numbers/activations at similar positions in the matrix you are evaluating and the ones that were labeled in input space?

so in conclusion, how does any machine learning library decide to initialize these filters ? lets say for our example, I want to apply 18x18x3 filters, what will these filter matrices look like ? how do they get applied in initial layers and how do they get populated when used within a deep net?

Can anyone help me understand ?

Thanks.

Upvotes: 0

Views: 340

Answers (1)

JimmyOnThePage
JimmyOnThePage

Reputation: 965

In short, the filters are initialised randomly. The convolutional net is then trained on a massive amount of images with labels.

Through the training process the feature extraction section (conv filters) and classification section (usually Dense layers, found after the conv filters) work in tandem to produce the best classification results on the images. Bad classification results lead to the weights of the filters in the feature extraction section being altered in a specific manner (backpropogation). This process is repeated a huge amount of times, after which the filters leading to the best classification performance are finally 'selected' to be part of the final model.

Edges are very important for image classification purposes, so the model 'learns' early on to identify edges if it wants to classify the images correctly. So while the process may seem random, CNN filters usually identify edges and colours in the earlier layers, since these lead to the best classification.

In deeper layers, the filters learn more complex objects from these simple edges and colours. This is the power of 'distributed learning' as done by CNNs and ANNs in general, learning functions of simple functions to create more complex functions.

Upvotes: 1

Related Questions