torchtext field with values converted to ids got error integer is required

Question

I followed this tutorial http://www.programmersought.com/article/2609385756/

to create a TabularDataset with data that already tokenized and converted to ids and I do not want to use vocab or build vocab because the data is numerical

so I defined my field variable as:

myField = Field(tokenize= x_tokenize, use_vocab=False, sequential=True)
train,val, test = data.TabularDataset.splits(path='./', train=train_path, validation=valid_path, test=test_path ,format='csv', fields=data_fields, skip_header=True)

train output:

print(vars(train[0])['src'])
#output this [101, 3177, 3702, 11293, 1116, 102]

and I used a BucketIterator:

train_iter= BucketIterator(train,
                       batch_size=BATCH_SIZE,
                       device = DEVICE,
                       sort_key=lambda x: (len(x.src), len(x.trg)), 
                       train=True,
                       batch_size_fn=batch_size_fn,
                       repeat=False)

when I run this code:

batch = next(iter(train_iter))

I got TypeError: an integer is required (got type list)

TypeError Traceback (most recent call last) in () ----> 1 batch = next(iter(train_iter))

3 frames /usr/local/lib/python3.6/dist-packages/torchtext/data/iterator.py in iter(self) 155 else: 156 minibatch.sort(key=self.sort_key, reverse=True) --> 157 yield Batch(minibatch, self.dataset, self.device) 158 if not self.repeat: 159 return

/usr/local/lib/python3.6/dist-packages/torchtext/data/batch.py in init(self, data, dataset, device) 32 if field is not None: 33 batch = [getattr(x, name) for x in data] ---> 34 setattr(self, name, field.process(batch, device=device)) 35 36 @classmethod

/usr/local/lib/python3.6/dist-packages/torchtext/data/field.py in process(self, batch, device) 199 """ 200 padded = self.pad(batch) --> 201 tensor = self.numericalize(padded, device=device) 202 return tensor 203

/usr/local/lib/python3.6/dist-packages/torchtext/data/field.py in numericalize(self, arr, device) 321 arr = self.postprocessing(arr, None) 322 --> 323 var = torch.tensor(arr, dtype=self.dtype, device=device) 324 325 if self.sequential and not self.batch_first:

TypeError: an integer is required (got type list)

torchtext field with values converted to ids got error integer is required

Answers (1)

Related Questions