Reputation: 37691
I have a tensor of size 4 x 6
where 4 is batch size and 6 is sequence length. Every element of the sequence vectors are some index (0 to n). I want to create a 4 x 6 x n
tensor where the vectors in 3rd dimension will be one hot encoding of the index which means I want to put 1 in the specified index and rest of the values will be zero.
For example, I have the following tensor:
[[5, 3, 2, 11, 15, 15],
[1, 4, 6, 7, 3, 3],
[2, 4, 7, 8, 9, 10],
[11, 12, 15, 2, 5, 7]]
Here, all the values are in between (0 to n) where n = 15. So, I want to convert the tensor to a 4 X 6 X 16
tensor where the third dimension will represent one hot encoding vector.
How can I do that using PyTorch functionalities? Right now, I am doing this with loop but I want to avoid looping!
Upvotes: 20
Views: 19581
Reputation: 150
The easiest way I found. Where x is a list of numbers and class_count is the amount of classes you have.
def one_hot(x, class_count):
return torch.eye(class_count)[x,:]
Use it like this:
x = [0,2,5,4]
class_count = 8
one_hot(x,class_count)
tensor([[1., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 1., 0., 0.],
[0., 0., 0., 0., 1., 0., 0., 0.]])
Upvotes: 15
Reputation: 7691
NEW ANSWER
As of PyTorch 1.1, there is a one_hot
function in torch.nn.functional
. Given any tensor of indices indices
and a maximal index n
, you can create a one_hot version as follows:
n = 5
indices = torch.randint(0,n, size=(4,7))
one_hot = torch.nn.functional.one_hot(indices, n) # size=(4,7,n)
Very old Answer
At the moment, slicing and indexing can be a bit of a pain in PyTorch from my experience. I assume you don't want to convert your tensors to numpy arrays. The most elegant way I can think of at the moment is to use sparse tensors and then convert to a dense tensor. That would work as follows:
from torch.sparse import FloatTensor as STensor
batch_size = 4
seq_length = 6
feat_dim = 16
batch_idx = torch.LongTensor([i for i in range(batch_size) for s in range(seq_length)])
seq_idx = torch.LongTensor(list(range(seq_length))*batch_size)
feat_idx = torch.LongTensor([[5, 3, 2, 11, 15, 15], [1, 4, 6, 7, 3, 3],
[2, 4, 7, 8, 9, 10], [11, 12, 15, 2, 5, 7]]).view(24,)
my_stack = torch.stack([batch_idx, seq_idx, feat_idx]) # indices must be nDim * nEntries
my_final_array = STensor(my_stack, torch.ones(batch_size * seq_length),
torch.Size([batch_size, seq_length, feat_dim])).to_dense()
print(my_final_array)
Note: PyTorch is undergoing some work currently, that will add numpy style broadcasting and other functionalities within the next two or three weeks and other functionalities. So it's possible, there'll be better solutions available in the near future.
Hope this helps you a bit.
Upvotes: 29
Reputation: 6547
This can be done in PyTorch
using the in-place scatter_
method for any Tensor
object.
labels = torch.LongTensor([[[2,1,0]], [[0,1,0]]]).permute(0,2,1) # Let this be your current batch
batch_size, k, _ = labels.size()
labels_one_hot = torch.FloatTensor(batch_size, k, num_classes).zero_()
labels_one_hot.scatter_(2, labels, 1)
For num_classes=3
(the indices should vary from [0,3)
), this will give you
(0 ,.,.) =
0 0 1
0 1 0
1 0 0
(1 ,.,.) =
1 0 0
0 1 0
1 0 0
[torch.FloatTensor of size 2x3x3]
Note that labels
should be a torch.LongTensor
.
PyTorch Docs Reference: torch.Tensor.scatter_
Upvotes: 3