elexhobby
elexhobby

Reputation: 2668

Weighted summation of embeddings in pytorch

I have a sequence of 12 words which I represent using a 12x256 matrix (using word embeddings). Let us refer to these as . I wish to take this as input and output a 1x256 vector. However I don't want to use a (12x256) x 256 dense layer. Instead I want to create the output embedding using a weighted summation of the 12 embeddings

where the wi s are scalars (thus there is weight sharing).

How can I create trainable wi s in pytorch? I am new and only familiar with the standard modules like nn.Linear.

Upvotes: 4

Views: 10191

Answers (2)

antoleb
antoleb

Reputation: 323

You can implement this via 1D convolution with kernel_size = 1

import torch

batch_size=2

inputs = torch.randn(batch_size, 12, 256)
aggregation_layer = torch.nn.Conv1d(in_channels=12, out_channels=1, kernel_size=1)
weighted_sum = aggregation_layer(inputs)

Such convolution will have 12 parameters. Each parameter will be a equal to e_i in formula you provided.

In other words this convolution will ran over dimetion with size 256 and sum it with learnable weights.

Upvotes: 7

erap129
erap129

Reputation: 960

This should do the trick for weighted avg:

from torch import nn
import torch


class LinearWeightedAvg(nn.Module):
    def __init__(self, n_inputs):
        super(LinearWeightedAvg, self).__init__()
        self.weights = nn.ParameterList([nn.Parameter(torch.randn(1)) for i in range(n_inputs)])

    def forward(self, input):
        res = 0
        for emb_idx, emb in enumerate(input):
            res += emb * self.weights[emb_idx]
        return res


example_data = torch.rand(12, 256)
wa_layer = LinearWeightedAvg(12)
res = wa_layer(example_data)
print(res.shape)  

Answer inspired by a previous answer I received in the pytorch forums:
https://discuss.pytorch.org/t/dense-layer-with-different-inputs-for-each-neuron/47348

Upvotes: 1

Related Questions