How to keep track of hidden states for different input shapes

Question

I defined a RNN "by hand", composed of multiple linear layers with pruned connections.

To keep track of the hidden states, I have a variable next_hidden_states in which I save the hidden states at time t, to re-use them at time t+1. This variable is of size (batch_size, N).

During my training/evaluation, I would like to be able to evaluate the model for inputs with batch size (train the agent) or without batch size (run an episode in the environment). This is usually possible for classic pytorch modules as the batch size is implicit...

I thought about giving the next_hidden_states as an argument and output of the network, but it is quite inelegant.

Edit

Here is a minimal version of my code

import numpy as np

import torch
import torch.nn.utils.prune as prune
import torch.nn as nn


class BrainRNN(nn.Module):
    def __init__(self, activation=torch.sigmoid, batch_size=8):
        super(BrainRNN, self).__init__()
        self.n_neurons = 3*4
        self.activation = activation
        self.batch_size = batch_size
        self.reset_hidden_states()

        # Create the input layer
        self.input_layer = nn.Linear(4, 4)

        # Create forward hidden layers
        self.hidden_layers = nn.ModuleList([])
        new_layer = nn.Linear(4,4)
        mask = np.ones((4,4))-np.eye(4)
        prune.custom_from_mask(new_layer, name='weight', mask=torch.tensor(mask.T)) # delete fictive connections
        self.hidden_layers.append(new_layer)

        # Create the backward weights
        self.recurrent_layers = nn.ModuleList([]) # recurrent_layers[i](hidden_states) = layer j>i to i

        new_layer = nn.Linear(self.n_neurons, 4, bias=False) # no bias for backward connection
        mask = np.zeros((12,4))
        mask[1,0] = 1
        prune.custom_from_mask(new_layer, name='weight', mask=torch.tensor(mask.T)) # delete fictive connections
        self.recurrent_layers.append(new_layer)

        # Create the output layer
        self.output_layer = nn.Linear(4,4)

    def forward(self, x):
        next_hidden_states = torch.empty(x.shape[0], self.n_neurons) if x.dim() > 1 else torch.empty(self.n_neurons)
        skips = [] # list of current states for skip connections

        # Input layer
        x = self.activation(self.input_layer(x) + self.recurrent_layers[0](self.hidden_states))
        next_hidden_states[...,[0,1,2,3]] = x

        # Hidden layers
        x = self.hidden_layers[0](x)
        x = self.activation(x)
        next_hidden_states[...,[4,5,6,7]] = x

        # Output layer
        x = self.output_layer(x) # no activation nor recurrent/skip connection for the last one
        
        self.hidden_states = next_hidden_states

        return x

    def reset_hidden_states(self, hidden_states=None):
        if self.batch_size > 0:
            self.hidden_states = nn.init.normal_(torch.empty(self.n_neurons), std=1).repeat(self.batch_size,1) # same hidden states for all batches
        else:
            self.hidden_states = nn.init.normal_(torch.empty(self.n_neurons), std=1)

nn = BrainRNN()
nn(torch.zeros(8,4)) # works well
nn(torch.zeros(4)) # shape issue at next_hidden_states[...,[0,1,2,3]] = x

where there are 3 layers of 4 nodes each, with a recurrent connexion between hidden layer and input layer, and some pruned connections.

The aim is to be able, if nn = BrainRNN(...), to evaluate nn(torch.zeros((B,4))) as well as nn(torch.zeros(4)). Ideally, I would like to reproduce the behavior of classic nn.Modules, but I don't really know how to do so while saving the states...

How to keep track of hidden states for different input shapes

Answers (1)

Related Questions