Joe Rakhimov
Joe Rakhimov

Reputation: 5083

How to change Pytorch model to work with 3d input instead 2d input?

I am trying to train an agent to play Connect4 game. I found an example of how it can be trained. Representation of board is 1x6x7 array:

[[[0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0]
  [0 0 0 0 0 0 2]
  [0 0 0 0 0 0 1]]]

This neural network architecture is used:

class Net(BaseFeaturesExtractor):
    def __init__(self, observation_space: gym.spaces.Box, features_dim: int = 256):
        super(Net, self).__init__(observation_space, features_dim)
        # We assume CxHxW images (channels first)
        # Re-ordering will be done by pre-preprocessing or wrapper
        n_input_channels = observation_space.shape[0]
        self.cnn = nn.Sequential(
            nn.Conv2d(n_input_channels, 32, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=0),
            nn.ReLU(),
            nn.Flatten(),
        )

        # Compute shape by doing one forward pass
        with th.no_grad():
            n_flatten = self.cnn(th.as_tensor(observation_space.sample()[None]).float()).shape[1]

        self.linear = nn.Sequential(nn.Linear(n_flatten, features_dim), nn.ReLU())

    def forward(self, observations: th.Tensor) -> th.Tensor:
        return self.linear(self.cnn(observations))

And it scored not so bad on a game with an agent 2 which moves randomly:

Agent 1 Win Percentage: 0.59 
Agent 2 Win Percentage: 0.38 
Number of Invalid Plays by Agent 1: 3 
Number of Invalid Plays by Agent 2: 0
Number of Draws (in 100 game rounds): 0

Here 3 layers representation was suggested as one of the ways how an agent can be improved:

enter image description here

I have tried to implement it and this is the example of the new 3 layer representation of board:

[[[0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0]
  [0 0 0 0 0 0 1]]

 [[0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0]
  [0 0 0 0 0 0 1]
  [0 0 0 0 0 0 0]]

 [[0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0]
  [0 0 0 0 0 0 1]
  [0 0 0 0 0 0 0]
  [1 1 1 1 1 1 0]]]

When I run this with the current neural network architecture, the agent is not able to train appropriately:

Agent 1 Win Percentage: 0.0
Agent 2 Win Percentage: 0.0
Number of Invalid Plays by Agent 1: 100
Number of Invalid Plays by Agent 2: 0
Number of Draws (in 100 game rounds): 0

Here you can see my code.

As you can see now I have 3 layers instead of one. That's why I have tried to use Conv3d:

class Net(BaseFeaturesExtractor):
    def __init__(self, observation_space: gym.spaces.Box, features_dim: int = 256):
        super(Net, self).__init__(observation_space, features_dim)
        # We assume CxHxW images (channels first)
        # Re-ordering will be done by pre-preprocessing or wrapper
        n_input_channels = observation_space.shape[0]
        self.cnn = nn.Sequential(
            nn.Conv3d(n_input_channels, 32, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.Conv3d(32, 64, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.Conv3d(64, 128, kernel_size=3, stride=1, padding=0),
            nn.ReLU(),
            nn.Flatten(),
        )

        # Compute shape by doing one forward pass
        with th.no_grad():
            n_flatten = self.cnn(th.as_tensor(observation_space.sample()[None]).float()).shape[1]

        self.linear = nn.Sequential(nn.Linear(n_flatten, features_dim), nn.ReLU())

When I try to run this code, it is showing this error:

RuntimeError: Expected 5-dimensional input for 5-dimensional weight [32, 1, 3, 3, 3], but got 4-dimensional input of size [1, 3, 6, 7] instead

My question: how can I use Conv3D layer with 3x6x7 shaped input?

Upvotes: 1

Views: 982

Answers (1)

David
David

Reputation: 331

The comment from Shai is correct. You do not need to use Conv3D layer here. The shape of your Conv3D filters would violate the calculation of size after application of a convolutional filter by reducing at least 1 dimension to less than 1 which is why you are getting your error (you can't multiple with a value that does not exist).

Simply using the original model implementation should work for you.

Similar to images with 3 color bands, these are typically not processed with Conv3d (maybe a different case with hyperspectral images, but that is not relevant here). There is some discussion about how to treat each of the color bands, and you can affect this in a variety of ways.

For example, adjusting the groups argument of the Conv2D layer at instantiation will change the connections between in_channels and out_channels of the layer and which are convolved to which, as per their documentation: https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html.

You may be able to optimize this in your model, or otherwise experiment with it.

In any case, simply using the existing implementation with Conv2D should be fine for you. Conv3D is typically utilized in the case of 3 spatial dimensions, or sometimes 2 spatial and 1 temporal dimension. While your case is kind of like a limited version of 3 spacial dimensions, it is not necessarily the same as, say, a 3D vector-field of fluid flow in regards to how each "pixel" has some spatial relevance/correlation to its neighboring "pixels". Your "spatial pixels" have a somewhat different kind of relevance or correlation mapping than this.

Upvotes: 1

Related Questions