Reputation: 25
I have a list of sentences and a list of queries. The queries have distinct space-separated words, I have to find the sentences having all the queries and print the indexes of the sentences. Example:
3
hey how are you
how do you do
how are you doing
2
how
how are
Output:
0 1 2
0 2
The input structure is something like this:
sentences = ['hey how are you' , 'how do you do' , 'how are you doing']
queries = ['how', 'how are']
I have been using O(n^3) algorithm but that's very slow and giving me a TLE. Is there a faster way to do it, maybe regex but I haven't been able to figure out how to build the expression?
The input size is limited to 10^4.
My code :
def textQueries(sentences, queries):
def maptoDict(sentence):
d = {}
for word in sentence.split():
if word not in d.keys():
d[word] = 1
else:
d[word] += 1
return d
s = list(map(maptoDict,sentences))
q = list(set(query.split()) for query in queries)
for query in q:
res = []
for i in range(len(s)):
if query.issubset(set(s[i].keys())):
res.append(i)
if not len(res):
res.append(-1)
for r in res:
print(r, end = ' ')
print()
Upvotes: 0
Views: 601
Reputation: 17805
You can store each subarray of strings in a map
. The value
of the key
in the map
will be a list(of indices of course). Below is the pseudocode-
Pseudocode:
Map<string,list> map
for each_sentence in sentence_list:
words = each_sentence.split("\\s")
for i = 0 to words.length():
for j=i to words.length():
subword = string from i to j
if map.containsKey(subword):
map.get(subword).add(each_sentence's index)
else:
map.put(subword,new list(each_sentence's index))
for each_query in query_list:
print map.containsKey(each_query) ? map.get(each_query) : -1
Time Complexity: O(n^2) where n
is the maximum length of a sentence among all sentences.
Upvotes: 0
Reputation: 4606
I formatted the output so you could trace the loop to see how each item is retrieved. You can use the elements of this to just print the index
if you'd like but I wanted you to see how to get the things you are requesting.
sentences = ['hey how are you', 'how do you do', 'how are you doing']
queries = ['how', 'how are']
for i, items in enumerate(sentences):
for j in queries:
if j in items:
print(f"Query '{j}' is in Sentence {i}")
Output
(xenial)vash@localhost:~/python/stack_overflow$ python3.7 sent_find.py Query 'how' is in Sentence 0 Query 'how are' is in Sentence 0 Query 'how' is in Sentence 1 Query 'how' is in Sentence 2 Query 'how are' is in Sentence 2
This will get that basic output:
sentences = ['hey how are you', 'how do you do', 'how are you doing']
queries = ['how', 'how are']
for i in queries:
for j, items in enumerate(sentences):
if i in items:
print(j, end=' ')
print()
Output
(xenial)vash@localhost:~/python/stack_overflow$ python3.7 sent_find.py
0 1 2
0 2
Upvotes: 1
Reputation: 15300
Python supports the data structure called a set. You can post-process your sentences to produce a map of words to sets.
That is, a map like:
word_in_sentences["how"] = set(0, 1, 2)
With that data structure, you can compute the set intersection of all the query words. This would give you a set that contains all the words in the query, with no concern about the order of the words.
Once you filter the sentences down to that smaller group, doing whatever ordering search should be faster.
Upvotes: 1