Inverted Index Speed Up

Question

I've implemented a inverted index in python, which is essentially a dictionary, whose key is words in the corpus, value is the tuple containing document that the key occurs in together with its bm25 score.

{
"love": [(doc1, 12), (doc3, 7.9), (doc5, 6.5)],
"hate": [(doc2, 8.7), (doc4, 3.2)]
}

However, when I process a query, I find it's hard to benefit from the efficiency of inverted index, because I must iterate all words in the query in a for loop. Within this loop, I must further loop over the documents the word links and maintain a global score table for all documents.

I think this is not the optimal way. Some ideas to speed up? I think a batch dictionary which accepts multiple keys and returns multiple values in parallel would help.

Inverted Index Speed Up

Answers (1)

Related Questions