Reputation: 102
So in 2d convolution when I define a 3x3 kernel, the operation is actually carried out using a 3x3xn kernel, n being the number of input channels.
Is this the same in 3d convolution? That is to say, if I define a 3x3x3 kernel on an input of dimensions (128,128,128,3) (width,height,depth,channels), then is the operation carried out with a kernel of dimensions 3x3x3x3 where the last three is determined by the number of input channels?
Upvotes: 1
Views: 195
Reputation: 419
This is a good question. 3d cameras work by capturing two flat images side by side. I’m not sure how it would look in tensor form, but you would need the typical 1080x1080x3 dimensions for one photo and the same for the other photo, and they would have to be associated with each other somehow. Actually Facebook just came out with a new library for this type of operation called PyTorch 3D
Upvotes: 1