Amrith Krishna
Amrith Krishna

Reputation: 2851

Efficient data structure for storing data with relative ordering

I have to store a sentence along with the possible semgments of the sentence into an efficient data structure. Currently I use a dictionary followed by a list for each key of the dictionary to store the segments. Can I use a better data structure to store the same efficiently. I have detailed the whole requirements below.

Input sentence with possible candidate segments

Here, the sentence starts with pravaramuku.........yugalah, the one without any background colour. Each of the colored boxes numbered 1 to 24 are segments of the sentence.

Now currently I store the following as follows

class sentence:
      sentence = "pravaramuku....."
      segments = dict()

The keys are starting position of the box relative to the sentence and values are objects storing details of each of the box.

    segments = {0: [pravara_box1, pravara_box10], 
7:[mukuta_box2], 
13:[manim_box3,maninm_box11,mani_box19,mani_box_25],...........}

Two boxes are said to be conflicting, if the key of one of the boxes is in between the key and key+len(word in box) of the other box (the range is inclusive). For example, Box 7 and Box 15 are conflicting and so is boxes 3 and 11.

In the program, one of the boxes will be selected as winner which is decided by a magic method. Once a winner is selected, its conflicting boxes are removed. Again another box is selected and this iteratively continues till no boxes remain.

Now, Currently my data-structure as you can see is a dictionary with each key has a list as its value.

What would be a better data structure to handle this, as currently the eliminating conflicting nodes portion is taking a lot of time.

My requirements can be summarized as follows:

Any help that can partially solve my problems are appreciated.

Upvotes: 2

Views: 339

Answers (1)

inspectorG4dget
inspectorG4dget

Reputation: 114035

It seems like you could get away with a backtracking depth-first-search on a properly constructed tree:

sentence = "pravaramuku.........yugalah"
words = sentenceToWords(sentence)  # it seems like you already have this

tree = collections.defauldict(list)
for word in words:
    for i in (i for i in range(len(sentence)) if sentence[i:i+len(word)] == word):
        tree[i].append(word)

Once that's done, you just need a depth first traversal of your tree:

def makeSentences(tree, pos=None, sofar=None):
    if pos is None: pos = 0
    if sofar is None: sofar = []
    if pos not in tree: print(' '.join(sofar))
    for word in tree[pos]:
        makeSentences(tree, pos+len(word), sofar+[word])

And then:

makeSentences(tree)

Upvotes: 1

Related Questions