Reputation: 13
am able to create an inverted index but I cannot quite implement a positional index. Positional index has a format of [doc_ID, pos_1, pos_2, ...]
here doc_ID indicate which document the word appears in and which position it appears in that document.
Ex. index = positional_index(['a','b','a'], ['a','c']])
when user enters index['a']
it will return [[0,0,2], [1,0]]
The following code is for the mentioned inverted index. I have no idea what else to add to make it positional index:
def positional index(tokens):
d = defaultdict(lambda:[])
for docID, t_list in enumerate(tokens):
for t in t_list:
d[t].append(docID)
return d
All help would be much appreciated.
Upvotes: 0
Views: 1954
Reputation: 180441
Using your own code you just need to add the indexes for each element and the docID using a set to avoid repeated keys:
def positional_index(tokens):
d = defaultdict(lambda:[])
for docID, sub_l in enumerate(tokens):
for t in set(sub_l):
d[t].append([docID] + [ind for ind, ele in enumerate(sub_l) if ele == t])
return d
In [9]: index= positional_index([['a','b','a'], ['a','c']])
In [10]: index["a"]
Out[10]: [[0, 0, 2], [1, 0]]
In [11]: index["b"]
Out[11]: [[0, 1]]
In [12]: index["c"]
Out[12]: [[1, 1]]
Upvotes: 1
Reputation: 107297
You can use the following function :
>>> def find_index(l,elem) :
... return [[i]+[t for t,k in enumerate(j) if k==elem] for i,j in enumerate(l)]
...
>>> find_index(l,'a')
[[0, 0, 2], [1, 0]]
All stuff that you need here is using enumerate
within two list comprehension .
Upvotes: 1