Reputation: 334
I'm using PyTorch to create several models which each one is run in a separate notebook.
When using torch text Field to create vocab it is assigning a number for each class that is correct and my original class labels also are numbers. But the assigned label for each class is not the same as the original class label. I was wondering is there a way to assign an exact number class for my Label vocab.
my code that creates torch text Field:
LABEL = data.LabelField()
LABEL.build_vocab(train_data)
my result's like this:
print(LABEL.vocab.stoi)
defaultdict(None, {'1': 0, '2': 1, '0': 2})
the result's I want:
defaultdict(None, {'0': 0, '1': 1, '2': 2})
I write this code for the solution. Is it correct to create vocab like this?
LABEL.build_vocab({'0': 0, '1': 1, '2': 2})
p.s: I know this assigning is just used in models and everything works fine but I was worried about the time I comparison models results on test data and was more worried about my confusion each time I look at the confession matrix.
Upvotes: 0
Views: 410
Reputation: 123
I don't think this is going to give you what you want. build_vocab
iterates over a dataset and maps an item to an index if it appears in the dataset above some min_freq
(default of min_freq=1). I think what you are giving it in your last example will tell build_vocab
that the item '0' appears 0 times, so it won't be included in your dataset.
If you are concerned about mixing things up in your review process, you can always write a script to get the index of a certain label, then get whatever is at that index, and map it to a new dict with the index you want. This will probably be much easier than messing with the way torchtext is building your vocabulary.'
EDIT: A better solution for you mght be setting use_vocab=False
when defining your Label field:
LABEL = data.LabelField(use_vocab=False)
This will work in your case, when your data is already numerical. From the torchtext 0.8 docs:
use_vocab: Whether to use a Vocab object. If False, the data in this field should already be numerical. Default: True.
Upvotes: 2