Reputation: 3998

Find array of string inside an array of string

I have read a lot of ways to find a substring in a string. But in my case, I need to find a string of words (ie substring) inside a string. We can achieve this in O(n^3) times which is a very bad time complexity

For example

sentences = ["jim likes mary", "kate likes tom", "tom does not like jim"]<br/>
phrases = ["jim tom", "likes"]

I want to find the complete phrase in the sentence irrespective of the position

In the above case, the output will be

[2, [0,1]]

Explanation: Wherever all words of phrase match in the sentence return the index of the sentence

1) The first phrase jim tom is only present in 2nd indexes of sentences which is tom does not like jim so return the 2nd index
2) Whereas the second phrase likes are present in 1st and 2nd array so return 0 and 1 indices

I did with brute force but that is not an efficient way to do it

final_arr = []
phrases.each do |phrase|
  temp_arr = []
  sentences.each_with_index do |sentence, index|    
    multiple_word_phrase  = phrase.split(" ")
    if multiple_word_phrase.length > 1
      flag = 1
      multiple_word_phrase.each do |word|
        if !sentence.include?(word)
          flag = 0
          break
        end
      end
      temp_arr << index if flag == 1
    else
      temp_arr << index if sentence.include?(phrase)
    end
  end
  final_arr << temp_arr if temp_arr.any?
end

Is there any efficient way to achieve this problem O(NlogN) Time. I think this can be achieved with dynamic programming but not sure how to do it

Upvotes: 1

Answers (5)

Ja͢ck

Reputation: 173572

There's not much you can optimise in terms of algorithm, but you can shorten the code a fair deal:

phrases.map do |phrase|
  words = phrase.split
  sentences.map.with_index do |sentence, index|
    index if words.all? { |word| sentence[word] }
  end.compact
end

Breakdown:

The end-result has the same dimension as phrases, so you can express that with a map operation.
Inside each result, the list of search results has at most the number of elements in sentences, so you'd either use filter() or map().compact
For the innermost loop, all?() is used to describe all words must exist inside each sentence.

Upvotes: 1

Cary Swoveland

Reputation: 110685

You can speed the calculations as follows:

require 'set'

h = sentences.each_with_index.with_object({}) do |(str,i),h|
  h[i] = str.split.to_set
end
  #=> {0=>#<Set: {"jim", "likes", "mary"}>,
  #    1=>#<Set: {"kate", "likes", "tom"}>,
  #    2=>#<Set: {"tom", "does", "not", "like", "jim"}>}

keys = h.keys
  #=> [0, 1, 2]

phrases.map do |p|
  pa = p.split
  keys.select { |j| pa.all? { |s| h[j].include?(s) } }
end
  #=> [[2], [0, 1]]

The return value is not quite the return value required by the question: [2, [0,1]]. I suggest making all elements of this array arrays, however, even when they contain only a single element (e.g., [2]). This will make life easier for the coder down the road. If one wants [2, [0,1]], however, perform a simple adjustment at the end:

phrases.map do |p|
  pa = p.split
  keys.select { |j| pa.all? { |s| h[j].include?(s) } }
end.map { |e| e.size == 1 ? e.first : e }
  #=> [2, [0, 1]]

As the computational complexity of set lookups is close to O(1) (constant), the computational complexity of this approach is close to O(n²), where n is some measure to the sizes of sentences and phrases.

Upvotes: 1

Ayoub Omari

Reputation: 906

I am not very familiar with Ruby, but if you have concepts like hashmaps and hashsets you can optimize it. As I mentioned in my comment, if you are convinced that the time complexity of your algorithm is O(N^3) then you can optimize it to O(N^2).

To do so, take the array sentences and transform it to a hashmap that associates to each word a Set of indexes where it appears. For your example it will look like: "jim" -> Set(0, 2), "tom" -> Set(1, 2), "kate" -> Set(1) etc... This will take a time complexity of O(N) (because of O(1) time complexity of both look up in hashmap and addition in Set)

Now for each phrase you split it and take the intersection of the Sets of its words. For example, the result of first phrase will be the intersection of Indexes_of("jim") and indexes_of("tom") which is Set(1). The intersection will take you O(N) for each phrase. Given that you have N phrases, the time complexity is O(N^2).

Upvotes: 1

iGian

Reputation: 11193

Other option using Array#product:

# setup
mapped_phr = phrases.map(&:split).zip(0..)
mapped_sen = sentences.zip(0..)

# loop
res = mapped_phr
  .product(mapped_sen)
  .each_with_object(Hash.new { |h, k| h[k] = [] }) do |(phr, sen), h|
    h[phr.first] << sen.last if phr.first.all? { |e| sen.first.include? e }
  end

res #=> {["jim", "tom"]=>[2], ["likes"]=>[0, 1]}
res.values #=> [[2], [0, 1]]

Or you can join the phr.first to get a String as hash key.

Upvotes: 1

wp78de

Reputation: 18950

Maybe something like this using each_with_index, and an array of arrays for the phrases (I think it's nicer):

sentences = ["jim likes mary", "kate likes tom", "tom does not like jim"]
phrases = [["jim", "tom"], ["likes"]]

final_arr = []
sentences.each_with_index do |sentence, index|
    phrases.each do |words|
        if words.all? { |word| sentence.include? word }
            final_arr << index
        end
    end
end

Demo

Though, it's basically the same complexity.

Upvotes: 1

Find array of string inside an array of string

Answers (5)

Related Questions