Reputation: 11
I am trying to train a model with the following architecture :
self.lstm1 = nn.LSTM(in_channels, hidden_channels, num_layers, batch_first=True, dropout=dropout_prob)
self.batchnorm1 = nn.BatchNorm1d(hidden_channels)
self.lstm2 = nn.LSTM(hidden_channels, hidden_channels, num_layers, batch_first=True, dropout=dropout_prob)
self.batchnorm2 = nn.BatchNorm1d(hidden_channels)
self.lstm3 = nn.LSTM(hidden_channels, hidden_channels, num_layers, batch_first=True, dropout=dropout_prob)
self.batchnorm3 = nn.BatchNorm1d(hidden_channels)
self.fc1 = nn.Linear(hidden_channels, out_channels)
in_channels = 105
hidden_channels = 128
batch_size = 32
Dimension of the data is : (32,770,105)
Dimension of the output from lstm1 : (32,770,128)
When I am trying to train the network when I reach batchnorm1 layer I am getting this error:
RuntimeError: running_mean should contain 770 elements not 128
Can you let me know where the mistake is ?
I tried to use permeute to change the output dimesnion from (32,770,128) to (32,128,770) but still I was getting different error.
Upvotes: 0
Views: 37
Reputation: 539
Batch normalization is performed before lstm layer (regardless of the order you add them to your network), so You should set:
in_features = 770
self.batchnorm1 = nn.BatchNorm1d(in_features)
self.lstm1 = nn.LSTM(in_channels, hidden_channels, num_layers, batch_first=True, dropout=dropout_prob)
self.batchnorm2 = nn.BatchNorm1d(hidden_channels)
self.lstm2 = nn.LSTM(hidden_channels, hidden_channels, num_layers, batch_first=True, dropout=dropout_prob)
self.batchnorm3 = nn.BatchNorm1d(hidden_channels)
self.lstm3 = nn.LSTM(hidden_channels, hidden_channels, num_layers, batch_first=True, dropout=dropout_prob)
self.fc1 = nn.Linear(hidden_channels, out_channels)
Upvotes: 0