Understanding the concept of filters in conv nets for computer vision

Question

I am trying to understand the concept of filters in conv nets for computer vision . I understand what they do for instance they can be used to reduce dimensionality of the input image and so on.... what I am stuck at is where do these filters come from?

For example, I was watching a tutorial which show that to detect a vertical line /edge we can use a 3x3 filter of the shape [ [-1, 0 , 1], [-1, 0 , 1], [-1, 0 , 1] ] ... how did we come up with this matrix? Even using Keras, I only had to pass the number of filters that I want to use

model.add(Conv2D(64, (3, 3), padding='same', input_shape=x_train.shape[1:]))

where 64 is the number of filters I want to apply to the input ... but how does Keras or any other library decide what numbers the filter matrix will hold ? I am confused.

Say you have the input space of 1000 images each 36x36x3 where 3 is the channel (one for R, G and B) ... this means we have 3 matrix representing each image... in total it will be 1000 x 3 = 3000 matrices.

Now if I want to detect edges all over the image, by edge i mean outlines of objects to detect if the image is a laptop or a phone, how does this really happen within the conv net ? is the concept of finding edges is just abstract and all what finding edges means is similar numbers/activations at similar positions in the matrix you are evaluating and the ones that were labeled in input space?

so in conclusion, how does any machine learning library decide to initialize these filters ? lets say for our example, I want to apply 18x18x3 filters, what will these filter matrices look like ? how do they get applied in initial layers and how do they get populated when used within a deep net?

Can anyone help me understand ?

Thanks.

Understanding the concept of filters in conv nets for computer vision

Answers (1)

Related Questions