Seal
Seal

Reputation: 1060

Rejecting hash contents if they are not in array

I have this array:

array = ["1", "2", "3", "4"]

I have this array of hashes:

ah = [
 {:id=>"1", :value=>"A"},
 {:id=>"2", :value=>"B"},
 {:id=>"3", :value=>"C"},
 {:id=>"4", :value=>"D"},
 {:id=>"5", :value=>"E"},
 {:id=>"6", :value=>"F"},
 {:id=>"7", :value=>"G"},
 {:id=>"8", :value=>"H"},
     ]

I need to reject any hash in ah whose id is not in array.

What is the best way of achieving this?

Upvotes: 1

Views: 91

Answers (5)

Anthony
Anthony

Reputation: 15967

I realize there is already an accepted answer but since all the answers here are in O(n*m), I thought I'd propose an alternative in O(n)*.

Here's a rough benchmark if the ah array has 100_000 items and we have 10_000 items in the sub array. I'm including fl00r's answer here and Cary's as we're all trying to avoid the O(n*m) scenario.

                                user     system      total        real
select with include        34.610000   0.110000  34.720000 ( 34.924679)
reject with include        34.320000   0.100000  34.420000 ( 34.611992)
group and select            0.170000   0.010000   0.180000 (  0.182358)
select by value             0.040000   0.000000   0.040000 (  0.041073)
select with set             0.040000   0.000000   0.040000 (  0.048331)
hashify then values         0.130000   0.010000   0.140000 (  0.139686)

The code to reproduce this:

require 'benchmark'
require 'set'

list_size = 100_000
sub_list_size = 10_000

ah = Array.new(list_size) { |i| { id: i, value: "A" } }

array = []
sub_list_size.times { array << (0..list_size).to_a.sample }

def group_than_select(ah, array)
  grouped = ah.group_by { |x| x[:id] }

  good_keys = grouped.keys - array
  good_keys.map { |i| grouped[i] }.flatten
end

def select_by_fl00r(ah, array)
  array_hash = array.each_with_object({}){ |i, h| h[i] = true }
  ah.select{ |obj| array_hash[obj[:id]] }
end

def select_with_set(ah, array)
  array_to_set = array.to_set
  ah.select { |h| array_to_set.include?(h[:id]) }
end

def hashify_then_values_at(ah, array)
  h = ah.each_with_object({}) { |g,h| h.update(g[:id]=>g) }
  h.values_at(*(h.keys & array))
end

Benchmark.bm(25) do |x|
  x.report("select with include") do
    ah.select{|el| array.include?(el[:id])}
  end
  x.report("reject with include") do
    ah.reject{|e| !array.include?(e[:id])}
  end
  x.report("group and select") do
    group_than_select(ah, array)
  end
  x.report("select by value") do
    select_by_fl00r(ah, array)
  end
  x.report("select with set") do
    select_with_set(ah, array)
  end
  x.report("hashify then values") do
    hashify_then_values_at(ah, array)
  end
end
  • Hash maps are typically O(1) search though O(n) worst case is possible.

Upvotes: 4

Cary Swoveland
Cary Swoveland

Reputation: 110725

Here are two more possibilities.

array = ["1", "2", "3", "4", "99999999"]

#1

I expect the include? solutions would be considerably faster if array were first converted to a set:

require 'set'

def select_with_set(ah, array) 
  array_to_set = array.to_set
  ah.select { |h| array_to_set.include?(h[:id]) }
end

select_with_set(ah, array) 
  #=> [{:id=>"1", :value=>"A"}, {:id=>"2", :value=>"B"},
  #    {:id=>"3", :value=>"C"}, {:id=>"4", :value=>"D"}] 

#2

If, as in the example, the hash elements of ah have distinct values for :id, one could do this:

def hashify_then_values_at(ah, array)    
  h = ah.each_with_object({}) { |g,h| h.update(g[:id]=>g) }
  h.values_at(*(h.keys & array))
end

hashify_then_values_at(ah, array)
  #=> [{:id=>"1", :value=>"A"}, {:id=>"2", :value=>"B"},
  #    {:id=>"3", :value=>"C"}, {:id=>"4", :value=>"D"}]    

Upvotes: 1

tbrisker
tbrisker

Reputation: 15666

You can select the inverse - the hashes whose id is in array by using this code:

ah.select{|el| array.include?(el[:id])}

If you prefer reject, you can use:

ah.reject{|el| !array.include?(el[:id])}

For more info: Array#reject, Array#select. These methods create a new array, if you want to modify in place use Array#reject! or Array#select!.

Upvotes: 5

fl00r
fl00r

Reputation: 83680

For big pieces of data I would go with some preprocessing to avoid O(n*m) lookups.

array = ["1", "2", "3", "4"]
array_hash = array.each_with_object({}){ |i, h| h[i] = true }
ah.select{ |obj| array_hash[obj[:id]] }

Upvotes: 4

Josh
Josh

Reputation: 5721

A better solution than rejecting those ids that are not in the array is to only accept the ones that do:

ah.select { |hash| array.include?(hash[:id]) }

Upvotes: 2

Related Questions