Soumyadip Chakraborty
Soumyadip Chakraborty

Reputation: 775

Getting timeouts for huge arrays

I am taking some sentences in an array and some keywords in queries to check whether the keywords are present in sentences. For small sentences arrays it works fine but for huge array sentences it gets timeout everytime. Any idea on how to optimise this. TIA

def textQueries(sentences, queries)
  queries.map { |query|
    index_arr = []
    sentences.map.with_index { |sentence, index|
      sentence_arr = sentence.split(' ')
      if query.split(' ').all? { |qur| sentence_arr.include?(qur) }
        index_arr << index
      end
    }
    index_arr << -1 if index_arr.empty?
    puts index_arr.join " "
  }
end

Example inputs :
**Sentences**:
it go will away
go do art
what to will east
**Queries**
it will
go east will
will

**Expected Result**
0
-1
0 2

Upvotes: 0

Views: 67

Answers (1)

3limin4t0r
3limin4t0r

Reputation: 21130

There are a few optimizations that I see at first glance:

  1. You are currently splitting each sentence for every query. Your sample data has 3 sentences and 3 queries. This means each sentence is split 3 times (once of each query). Since the result doesn't depend on the query you should do this up front. Each sentence should only be split once.

  2. You are currently using sentences.map to iterate sentences, but don't capture the result. You are only using it for iteration purposes and push results to the index_arr. map creates a new array which you don't use, meaning you are chewing up memory that could be used elsewhere. This could be changed to each which is far more efficient if you don't use the return value.

  3. The code query.split(' ').all? { |qur| sentence_arr.include?(qur) } isn't really optimal, since it starts searching for a specific word from the front of sentence_arr each time. Checking if a certain collection is a subset or superset of another collection is something where Set often shines.

With all the above in mind something like this should be a lot faster:

require 'set'

def text_queries(sentences, queries)
  sentences = sentences.map { |sentence| Set.new(sentence.split(' ')) }

  queries.map do |query|
    query = Set.new(query.split(' '))

    indexes = sentences.each_index.select { |index| sentences[index] >= query }
    indexes << -1 if indexes.empty?

    indexes
  end
end

Note: If you decide to output the values to the console (like shown in the question):

puts indexes.join(' ')

Then there is no reason to use queries.map since an array with nil values will be returned (puts always returns nil). Change the map to each in this scenario.

Upvotes: 4

Related Questions