theoutlaw
theoutlaw

Reputation: 79

Inverted Index where I can save a tuple of the word along with an id of where it came from

I have created the following class to implement an inverted index in Python. I read questions from the quora question pair challenge. The questions are in this form:

---------------------------
qid  |question         
---------------------------
  1  |Why do we exist?
  2  |Is there life on Mars?
  3  |What happens after death?
  4  |Why are bananas yellow?

The problem is that I want the qid to get passed along with each word inside the inverted index so that I know after it gets created which question each word comes from, and access it easily.

class Index:
    """ Inverted index datastructure """

    def __init__(self):
        self.index = defaultdict(list)
        self.documents = {}
        self.__unique_id = 0


    def lookup(self, word):
        """
        Lookup a word in the index
        """
        word = word.lower()
        if self.stemmer:
            word = self.stemmer.stem(word)

        return [self.documents.get(id, None) for id in self.index.get(word)]


    def addProcessed(self, words):
        """
        Add a document string to the index
        """
        for word in words:
            if self.__unique_id not in self.index[word]:
                self.index[word].append(self.__unique_id)

        self.documents[self.__unique_id] = words
        self.__unique_id += 1

How could I implement this in my above data structure?

Upvotes: 1

Views: 206

Answers (1)

Oluwafemi Sule
Oluwafemi Sule

Reputation: 38982

A straightforward way to get qid into your index is to write Index.addProcessed to receive qid as a second argument and include that in the value set for unique_id key in the documents.

def addProcessed(self, words, qid):
    #...
    self.documents[self.__unique_id] = (words, qid)
    self.__unique_id += 1

Index.lookup will then return a list of tuples consisting of words and their question id.

Upvotes: 1

Related Questions