Alaa Grable
Alaa Grable

Reputation: 101

BERT Based CNN - Convolution and Maxpooling

I’m trying to fine-tune a pre-trainted BERT model (huggingface transformers) by inserting a CNN layer. In this model, the outputs of all transformer encoders are used, not only the output of the latest transformer encoder. So that the output vectors of each transformer encoder are concatenated, and a matrix is produced:

The convolutional operation is performed with a window of size (3, hidden size of BERT which is 768 in BERT_base model) and the maximum value is generated for each transformer encoder by applying max pooling on the convolution output.

By concatenating these values, a vector is generated which is given as input to a fully connected network. By applying softmax on the input, the classification operation is performed.

enter image description here

My problem is that I can’t seem to find the right arguments to perform the convolution and the maxpooling on that matrix.

With batch size = 32, there are 13 layers of Transformer encoders, each one get as an input [64, 768] of encoded tokenized text and outputs an encoding of the same dimensions. (64 is the max-length in tokenization)

I want to perform convolution on each transformer’s output matrix ([64,768]) separately, then perform maxpooling on that convolution’s output. So, I should get a max-value per each transformer, and these max values are inserted into the neural network.

My code is:

class BERT_Arch(nn.Module):

    def __init__(self, bert):
        super(BERT_Arch, self).__init__()
        self.bert = BertModel.from_pretrained('bert-base-uncased')
        self.conv = nn.Conv2d(in_channels=13, out_channels=13, kernel_size= (3, 768), padding=True) 
        self.relu = nn.ReLU()
        self.pool = nn.MaxPool1d(kernel_size=768, stride=1)
        self.dropout = nn.Dropout(0.1)
        self.fc = nn.Linear(9118464, 3)
        self.flat = nn.Flatten()
        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, sent_id, mask):
        _, _, all_layers = self.bert(sent_id, attention_mask=mask, output_hidden_states=True)
        # all_layers  = [32, 13, 64, 768]
        x = torch.cat(all_layers, 0) # x= [416, 64, 768]
        x = self.conv(x)
        x = self.relu(x)
        x = self.pool(x)
        x = self.flat(x)
        x = self.fc(x)
        return self.softmax(x)

I keep getting an error saying that the convolution method expected a certain dimensions as input but got a different one.

<generator object BERT_Arch.forward.<locals>.<genexpr> at 0x7fbeffc2d200>
torch.Size([416, 64, 768])
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-12-3a2c2cd7c02d> in <module>()
    362 
    363         # train model
--> 364         train_loss, _ = train()
    365 
    366         # evaluate model

5 frames
<ipython-input-12-3a2c2cd7c02d> in train()
    148 
    149         # get model predictions for the current batch
--> 150         preds = model(sent_id, mask)
    151 
    152         # compute the loss between actual and predicted values

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

<ipython-input-12-3a2c2cd7c02d> in forward(self, sent_id, mask)
     42         x = torch.cat(all_layers, 0) # torch.Size([13, 32, 64, 768])
     43         print(x.shape)
---> 44         x = self.conv(x)
     45         x = self.relu(x)
     46         x = self.pool(x)

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py in forward(self, input)
    421 
    422     def forward(self, input: Tensor) -> Tensor:
--> 423         return self._conv_forward(input, self.weight)
    424 
    425 class Conv3d(_ConvNd):

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py in _conv_forward(self, input, weight)
    418                             _pair(0), self.dilation, self.groups)
    419         return F.conv2d(input, weight, self.bias, self.stride,
--> 420                         self.padding, self.dilation, self.groups)
    421 
    422     def forward(self, input: Tensor) -> Tensor:

RuntimeError: Expected 4-dimensional input for 4-dimensional weight [13, 13, 3, 768], but got 3-dimensional input of size [416, 64, 768] instead

I tried different values for the convolution method arguments, I still got a similar error. And sometimes an error saying that the maxpooling output size is too small:

Given input size: (64x62x1). Calculated output size: (64x31x0). Output size is too small

and sometimes this error (after changing the arguments of the cnn layer):

RuntimeError: Given groups=1, weight of size [32, 32, 3, 3], expected input[13, 4, 64, 768] to have 32 channels, but got 4 channels instead

or

Expected input batch_size (X) to match target batch_size (Y)

How can I do this? I would be grateful for any help on how to do this CNN layer correctly.

Upvotes: 2

Views: 1400

Answers (1)

abe
abe

Reputation: 987

I advise you to read the documentation thoroughly: PyTorch Conv2D

Basically, it assumes the input to be a 4D tensor with shape (batch size, # of input channels, height, width). It will output another 4D tensor with shape (batch size, # of output channels, height, width), assuming proper padding. Your in_channels argument would be # of input channels, out_channels argument would be # of output channels. You will have out_channels different kernels, with each one having a shape of (in_channels, kernel_size[0], kernel_size[1]). So, your weight tensor will be 4D and have a shape of (batch_size, in_channels, kernel_size[0], kernel_size[1])

Upvotes: 0

Related Questions