Piyush Jaipuriyar
Piyush Jaipuriyar

Reputation: 25

Fast way to search for lists in list of lists

I have a list of sentences and a list of queries. The queries have distinct space-separated words, I have to find the sentences having all the queries and print the indexes of the sentences. Example:

3
hey how are you
how do you do
how are you doing
2
how
how are

Output:

0 1 2
0 2

The input structure is something like this:

sentences = ['hey how are you' , 'how do you do' , 'how are you doing']
queries = ['how', 'how are']

I have been using O(n^3) algorithm but that's very slow and giving me a TLE. Is there a faster way to do it, maybe regex but I haven't been able to figure out how to build the expression?

The input size is limited to 10^4.

My code :

def textQueries(sentences, queries):
def maptoDict(sentence):
    d = {}
    for word in sentence.split():
        if word not in d.keys():
            d[word] = 1
        else:
            d[word] += 1
    return d
s = list(map(maptoDict,sentences))
q = list(set(query.split()) for query in queries)
for query in q:
    res = []
    for i in range(len(s)):
        if query.issubset(set(s[i].keys())):
            res.append(i)
    if not len(res):
        res.append(-1)
    for r in res:
        print(r, end = ' ')
    print()

Upvotes: 0

Views: 601

Answers (3)

nice_dev
nice_dev

Reputation: 17805

You can store each subarray of strings in a map. The value of the key in the map will be a list(of indices of course). Below is the pseudocode-

Pseudocode:

    Map<string,list> map
    for each_sentence in sentence_list:
        words = each_sentence.split("\\s")
           for i = 0 to words.length():
               for j=i to words.length():
                 subword = string from i to j
                 if map.containsKey(subword):
                     map.get(subword).add(each_sentence's index)
                 else:
                    map.put(subword,new list(each_sentence's index))

   for each_query in query_list:
       print map.containsKey(each_query) ? map.get(each_query) : -1

Time Complexity: O(n^2) where n is the maximum length of a sentence among all sentences.

Upvotes: 0

vash_the_stampede
vash_the_stampede

Reputation: 4606

I formatted the output so you could trace the loop to see how each item is retrieved. You can use the elements of this to just print the index if you'd like but I wanted you to see how to get the things you are requesting.

sentences = ['hey how are you', 'how do you do', 'how are you doing']
queries = ['how', 'how are']

for i, items in enumerate(sentences):
   for j in queries:
        if j in items:
            print(f"Query '{j}' is in Sentence {i}")

Output

(xenial)vash@localhost:~/python/stack_overflow$ python3.7 sent_find.py 
Query 'how' is in Sentence 0
Query 'how are' is in Sentence 0
Query 'how' is in Sentence 1
Query 'how' is in Sentence 2
Query 'how are' is in Sentence 2

This will get that basic output:

sentences = ['hey how are you', 'how do you do', 'how are you doing']
queries = ['how', 'how are']

for i in queries:
    for j, items in enumerate(sentences):
        if i in items:
            print(j, end=' ')
    print()

Output

(xenial)vash@localhost:~/python/stack_overflow$ python3.7 sent_find.py 
0 1 2 
0 2 

Upvotes: 1

aghast
aghast

Reputation: 15300

Python supports the data structure called a set. You can post-process your sentences to produce a map of words to sets.

That is, a map like:

word_in_sentences["how"] = set(0, 1, 2)

With that data structure, you can compute the set intersection of all the query words. This would give you a set that contains all the words in the query, with no concern about the order of the words.

Once you filter the sentences down to that smaller group, doing whatever ordering search should be faster.

Upvotes: 1

Related Questions