Josh
Josh

Reputation: 5721

How to match data from 2 arrays efficiently with Ruby

Right now I have this:

array1 = [obj1, obj2, obj3, obj4]
array2 = [obj1, obj2, obj5, obj6]

array1.each do |item1|
  array2.each do |item2|
    if (item1[0] == item2[0]) && (item1[1] == item2[1])
      p "do stuff"
    end
  end
end

I need to match 2 pieces of data from each array but both arrays are very large and I'm wondering if there is a faster way to do this.

My current setup requires looking at each element in the second array for each element in the first array which seems terribly inefficient.

Upvotes: 3

Views: 161

Answers (3)

seph
seph

Reputation: 6076

Combining map and intersection:

(array1.map { |a| a.first 2 } & array2.map { |a| a.first 2 }).each do
  p "do_stuff"
end

Performance should be good. Memory intensive though.

Upvotes: 2

apeiros
apeiros

Reputation: 1705

This is very slow because - as DigitalRoss already stated - it is O(n^2). Assuming that eql? is just as fine for you as ==, you can build an index and iterate over that instead, that'll be O(n+m) instead:

array1 = [obj1, obj2, obj3, obj4]
array2 = [obj1, obj2, obj5, obj6]
index  = {}
found  = []

array1.each do |item1| index[item1.first(2)] = item1 end
array2.each do |item2|
  item1 = index[item2.first(2)]
  found << [item1,item2] if item1 then
end

found.each do |item1, item2| puts "do something" end

This assumes that the combination of the first 2 elements of all elements in array1 are unique within array1. If that's not the case, the code will be slightly more complex:

array1 = [obj1, obj2, obj3, obj4]
array2 = [obj1, obj2, obj5, obj6]
index  = {}
found  = []

array1.each do |item1|
  key          = item1.first(2)
  index[key] ||= []
  index[key]  << item1
end
array2.each do |item2|
  items_from_1 = index[item2.first(2)]
  if items_from_1 then
    found.concat(items_from_1.map { |item1| [item1,item2] })
  end
end

found.each do |item1, item2| puts "do something" end

Since you didn't provide any sample data, I didn't test the code.

I hope that helps.

Upvotes: 1

Mladen Jablanović
Mladen Jablanović

Reputation: 44080

If the two arrays is all you have, you can't avoid the O(n^2) complexity that DigitalRoss mentioned. However, you can index the data so that the next time you don't have to do it all again. In the simplest case, you can use hashes to allow direct access to your data based on the criteria used in the test:

index1 = array1.each_with_object({}){|e, acc|
  acc[[e[0], e[1]]] ||= []
  acc[[e[0], e[1]]] << e
}

and the same thing for the other array. Then, your loop would look like:

index1.each do |key1, vals1|
  if vals2 = index2[key1]
    vals1.product(vals2).each do |e1, e2|
      p do_stuff
    end
  end
end

which is, I believe, O(n).

Upvotes: 1

Related Questions