How to pad a text with custom length after build the vocab in pytorch

Question

I used torchtext vocab to convert the text to index

For example 1 have 2 names aaban aabharan

After vocab: [0, 0, 1, 0, 2] [0, 0, 1, 3, 0, 4, 0, 2]

Since the length of longest name in my data is 24 After using torch.nn.utils.rnn.pad_sequence([torch.tensor(name) for name in name], batch_first=True, padding_value = -1.)

I got tensor([ 0, 0, 1, 0, 2, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1]) tensor([ 0, 0, 1, 3, 0, 4, 0, 2, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1])

But i want to create the tensors with length 50, as there might be longer names which might not be in training data, how can i do it As in, how can i get the following, tensor([ 0, 0, 1, 0, 2, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,-1, -1, -1, -1, -1, -1, -1]) tensor([ 0, 0, 1, 3, 0, 4, 0, 2, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,-1, -1, -1, -1, -1, -1,-1, -1, -1, -1 ])

I tried going through the dcoumentation for both tochtext.vocab and orch.nn.utils but i couldn't find a way

How to pad a text with custom length after build the vocab in pytorch

Answers (0)

Related Questions