Reputation: 2668
I have a sequence of 12 words which I represent using a 12x256 matrix (using word embeddings). Let us refer to these as . I wish to take this as input and output a 1x256 vector. However I don't want to use a (12x256) x 256 dense layer. Instead I want to create the output embedding using a weighted summation of the 12 embeddings
where the wi s are scalars (thus there is weight sharing).
How can I create trainable wi s in pytorch? I am new and only familiar with the standard modules like nn.Linear.
Upvotes: 4
Views: 10191
Reputation: 323
You can implement this via 1D convolution with kernel_size = 1
import torch
batch_size=2
inputs = torch.randn(batch_size, 12, 256)
aggregation_layer = torch.nn.Conv1d(in_channels=12, out_channels=1, kernel_size=1)
weighted_sum = aggregation_layer(inputs)
Such convolution will have 12 parameters. Each parameter will be a equal to e_i in formula you provided.
In other words this convolution will ran over dimetion with size 256 and sum it with learnable weights.
Upvotes: 7
Reputation: 960
This should do the trick for weighted avg:
from torch import nn
import torch
class LinearWeightedAvg(nn.Module):
def __init__(self, n_inputs):
super(LinearWeightedAvg, self).__init__()
self.weights = nn.ParameterList([nn.Parameter(torch.randn(1)) for i in range(n_inputs)])
def forward(self, input):
res = 0
for emb_idx, emb in enumerate(input):
res += emb * self.weights[emb_idx]
return res
example_data = torch.rand(12, 256)
wa_layer = LinearWeightedAvg(12)
res = wa_layer(example_data)
print(res.shape)
Answer inspired by a previous answer I received in the pytorch forums:
https://discuss.pytorch.org/t/dense-layer-with-different-inputs-for-each-neuron/47348
Upvotes: 1