Reputation: 386
from torchtext.vocab import Vocab
from collections import Counter
def create_vocab(file_, tokenizer):
counter_dict = Counter()
for sentence in file_:
counter_dict.update(tokenizer(sentence))
return Vocab(counter_dict)
vocab = create_vocab(w_data, tokenizer)
vocab.lookup_indices([1,2,3])
Now on trying to look-up the indices, it gives:
AttributeError: ‘Counter’ object has no attribute ‘lookup_indices’
Upvotes: 1
Views: 1134
Reputation: 29317
lookup_indices
takes as input a list of strings (tokens) and returns a list of integers (their indices in the Vocab
):
lookup_indices(tokens: List[str]) → List[int]
To get the tokens corresponding to indices [1,2,3]
use lookup_tokens
instead.
Yes, it's a bit confusing one thinks that lookup_indices
looks for indices and lookup_tokens
for tokens but it's the other way around!
Here's a small example:
from collections import Counter, OrderedDict
from torchtext.vocab import vocab
c = Counter(["a", "a", "b", "b", "b", "c"])
# https://pytorch.org/text/stable/vocab.html
# requires an OrderedDict
ordered_dict = OrderedDict(sorted(c.items(), key=lambda x: x[1], reverse=True))
myVocabulary = vocab(ordered_dict)
print("myVocabulary is of type {}".format(type(myVocabulary)))
# myVocabulary is of type <class 'torchtext.vocab.vocab.Vocab'>
print(myVocabulary['a'], myVocabulary['b'])
# 1 0
print(myVocabulary.lookup_indices(['a', 'b']))
# [1, 0]
print(myVocabulary.lookup_tokens([0, 1]))
# ['b', 'a']
print(myVocabulary.lookup_indices(['a', 'b', 'c']))
# [1, 0, 2]
print(myVocabulary['c'])
# 2
Upvotes: 1