Reputation: 3998
I have read a lot of ways to find a substring in a string. But in my case, I need to find a string of words (ie substring) inside a string. We can achieve this in O(n^3)
times which is a very bad time complexity
For example
sentences = ["jim likes mary", "kate likes tom", "tom does not like jim"]<br/>
phrases = ["jim tom", "likes"]
I want to find the complete phrase in the sentence irrespective of the position
In the above case, the output will be
[2, [0,1]]
Explanation: Wherever all words of phrase match in the sentence return the index of the sentence
1) The first phrase jim tom is only present in 2nd indexes of sentences which is tom does not like jim so return the 2nd index
2) Whereas the second phrase likes are present in 1st and 2nd array so return 0 and 1 indices
I did with brute force but that is not an efficient way to do it
final_arr = []
phrases.each do |phrase|
temp_arr = []
sentences.each_with_index do |sentence, index|
multiple_word_phrase = phrase.split(" ")
if multiple_word_phrase.length > 1
flag = 1
multiple_word_phrase.each do |word|
if !sentence.include?(word)
flag = 0
break
end
end
temp_arr << index if flag == 1
else
temp_arr << index if sentence.include?(phrase)
end
end
final_arr << temp_arr if temp_arr.any?
end
Is there any efficient way to achieve this problem O(NlogN) Time
. I think this can be achieved with dynamic programming but not sure how to do it
Upvotes: 1
Views: 179
Reputation: 173572
There's not much you can optimise in terms of algorithm, but you can shorten the code a fair deal:
phrases.map do |phrase|
words = phrase.split
sentences.map.with_index do |sentence, index|
index if words.all? { |word| sentence[word] }
end.compact
end
Breakdown:
phrases
, so you can express that with a map operation.sentences
, so you'd either use filter()
or map().compact
all?()
is used to describe all words must exist inside each sentence.Upvotes: 1
Reputation: 110685
You can speed the calculations as follows:
require 'set'
h = sentences.each_with_index.with_object({}) do |(str,i),h|
h[i] = str.split.to_set
end
#=> {0=>#<Set: {"jim", "likes", "mary"}>,
# 1=>#<Set: {"kate", "likes", "tom"}>,
# 2=>#<Set: {"tom", "does", "not", "like", "jim"}>}
keys = h.keys
#=> [0, 1, 2]
phrases.map do |p|
pa = p.split
keys.select { |j| pa.all? { |s| h[j].include?(s) } }
end
#=> [[2], [0, 1]]
The return value is not quite the return value required by the question: [2, [0,1]]
. I suggest making all elements of this array arrays, however, even when they contain only a single element (e.g., [2]
). This will make life easier for the coder down the road. If one wants [2, [0,1]]
, however, perform a simple adjustment at the end:
phrases.map do |p|
pa = p.split
keys.select { |j| pa.all? { |s| h[j].include?(s) } }
end.map { |e| e.size == 1 ? e.first : e }
#=> [2, [0, 1]]
As the computational complexity of set lookups is close to O(1) (constant), the computational complexity of this approach is close to O(n2), where n
is some measure to the sizes of sentences
and phrases
.
Upvotes: 1
Reputation: 906
I am not very familiar with Ruby, but if you have concepts like hashmaps and hashsets you can optimize it. As I mentioned in my comment, if you are convinced that the time complexity of your algorithm is O(N^3)
then you can optimize it to O(N^2)
.
To do so, take the array sentences
and transform it to a hashmap that associates to each word a Set of indexes where it appears. For your example it will look like:
"jim" -> Set(0, 2), "tom" -> Set(1, 2), "kate" -> Set(1)
etc... This will take a time complexity of O(N)
(because of O(1)
time complexity of both look up in hashmap and addition in Set)
Now for each phrase you split it and take the intersection of the Sets of its words. For example, the result of first phrase will be the intersection of Indexes_of("jim")
and indexes_of("tom")
which is Set(1)
. The intersection will take you O(N)
for each phrase. Given that you have N
phrases, the time complexity is O(N^2)
.
Upvotes: 1
Reputation: 11193
Other option using Array#product:
# setup
mapped_phr = phrases.map(&:split).zip(0..)
mapped_sen = sentences.zip(0..)
# loop
res = mapped_phr
.product(mapped_sen)
.each_with_object(Hash.new { |h, k| h[k] = [] }) do |(phr, sen), h|
h[phr.first] << sen.last if phr.first.all? { |e| sen.first.include? e }
end
res #=> {["jim", "tom"]=>[2], ["likes"]=>[0, 1]}
res.values #=> [[2], [0, 1]]
Or you can join the phr.first
to get a String as hash key.
Upvotes: 1
Reputation: 18950
Maybe something like this using each_with_index
, and an array of arrays for the phrases (I think it's nicer):
sentences = ["jim likes mary", "kate likes tom", "tom does not like jim"]
phrases = [["jim", "tom"], ["likes"]]
final_arr = []
sentences.each_with_index do |sentence, index|
phrases.each do |words|
if words.all? { |word| sentence.include? word }
final_arr << index
end
end
end
Though, it's basically the same complexity.
Upvotes: 1