Reputation: 51
Say I have a an input array of size (64,100)
t = torch.randn((64,100))
Now say I want to multiply each of the 6400
elements of t
with 6400 separate vectors
each of size 256
to produce a tensor
of size [64, 100, 256]
. This is what I am doing currently -
import copy
def clones(module, N):
"Produce N identical layers."
return nn.ModuleList([copy.deepcopy(module) for _ in range(N)])
linears = clones(nn.Linear(1,256, bias=False), 6400)
idx = 0
t_final = []
for i in range(64):
t_bs = []
for j in range(100):
t1 = t[i, j] * linears[idx].weight.view(-1)
idx += 1
t_bs.append(t1)
t_bs = torch.cat(t_bs).view(1, 100, 256)
t_final.append(t_bs)
t_final = torch.cat(t_final)
print(t_final.shape)
Output: torch.Size([64, 100, 256])
Is there a faster and cleaner way of doing the same thing? I tried torch.matmul
and torch.dot
but couldn't do any better.
Upvotes: 0
Views: 204
Reputation: 1685
You don't actually need to clone your linear layer if you really want to multiply tenor t
with the same weight of linear layer for 6400
times. rather you can do the following:
t = torch.randn((64,100)).unsqueeze(-1)
w = torch.rand((256)).view(1,1,256).repeat(64, 100, 1)
#or
w = torch.stack(6400*[torch.rand((256))]).view(64,100,256)
result = t*w # shape: [64, 100, 256]
However, If your want to keep the same structure you currently have, then you can do something following:
t = torch.randn((64,100)).unsqueeze(-1)
w = torch.stack([linears[i].weight for i in range(len(linears))]).view(64,100,256)
result = t*w # shape: [64, 100, 256]
Upvotes: 0
Reputation: 565
It seems broadcast
is what you are looking for.
t = torch.randn((64,100)).view(6400, 1)
weights = torch.randn((6400, 256))
output = (t * weights).view(64, 100, 256)
Upvotes: 2