bunny
bunny

Reputation: 123

RuntimeError: mat1 and mat2 shapes cannot be multiplied

I'm trying to input a 5D tensor with shape ( 1, 8, 32, 32, 32 ) to a VAE I wrote:

self.encoder = nn.Sequential(
        nn.Conv3d( 8, 16, 4, 2, 1 ), # 32 -> 16
        nn.BatchNorm3d( 16 ), 
        nn.LeakyReLU( 0.2 ),
        
        nn.Conv3d( 16, 32, 4, 2, 1 ), # 16 -> 8
        nn.BatchNorm3d( 32 ),
        nn.LeakyReLU( 0.2 ),
        
        nn.Conv3d( 32, 48, 4, 2, 1 ), # 16 -> 4
        nn.BatchNorm3d( 48 ),
        nn.LeakyReLU( 0.2 ), 
    )
    
    self.fc_mu = nn.Linear( 3072, 100 ) # 48*4*4*4 = 3072
    self.fc_logvar = nn.Linear( 3072, 100 )
    
self.decoder = nn.Sequential(
    nn.Linear( 100, 3072 ),
    nn.Unflatten( 1, ( 48, 4, 4 )),
    nn.ConvTranspose3d( 48, 32, 4, 2, 1 ), # 4 -> 8
    nn.BatchNorm3d( 32 ),
    nn.Tanh(),
        
    nn.ConvTranspose3d( 32, 16, 4, 2, 1 ), # 8 -> 16
    nn.BatchNorm3d( 16 ),
    nn.Tanh(),
        
    nn.ConvTranspose3d( 16, 8, 4, 2, 1 ), # 16 -> 32
    nn.BatchNorm3d( 8 ),
    nn.Tanh(), 
)

def reparametrize( self, mu, logvar ):
    std = torch.exp( 0.5 * logvar )
    eps = torch.randn_like(  std )
    return mu + eps * std 

def encode( self, x ) :
    x = self.encoder( x )
    x = x.view( -1, x.size( 1 ))
    
    mu = self.fc_mu( x )
    logvar = self.fc_logvar( x )
    
    return self.reparametrize( mu, logvar ), mu, logvar 
    
def decode( self, x ):
    return self.decoder( x )
    
def forward( self, data ):
    z, mu, logvar = self.encode( data )
    return self.decode( z ), mu, logvar 

The error I'm getting is: RuntimeError: mat1 and mat2 shapes cannot be multiplied (64x48 and 3072x100). I thought I had calculated the output dimensions from each layer correctly, but I must have made a mistake, but I'm not sure where.

Upvotes: 0

Views: 16776

Answers (1)

Natthaphon Hongcharoen
Natthaphon Hongcharoen

Reputation: 2430

This line

x = x.view( -1, x.size( 1 ))

Means you leave the second dimension(channel) as is and put everything else at the first dimension(batch).

And as the output of the self.encoder is (1, 48, 4, 4, 4), doing that means you'll get (64, 48) but from the look of it I think you want (1, 3072) instead.

So this should solve this particular problem.

x = x.view(x.size(0), -1)

Then you'll run into RuntimeError: unflatten: Provided sizes [48, 4, 4] don't multiply up to the size of dim 1 (3072) in the input tensor.

The cause is the unflatten here

nn.Linear(100, 3072),
nn.Unflatten(1, (48, 4, 4)),
nn.ConvTranspose3d(48, 32, 4, 2, 1)

Has to be (48, 4, 4, 4) instead.

Upvotes: 4

Related Questions