Reputation: 1060
I have this array:
array = ["1", "2", "3", "4"]
I have this array of hashes:
ah = [
{:id=>"1", :value=>"A"},
{:id=>"2", :value=>"B"},
{:id=>"3", :value=>"C"},
{:id=>"4", :value=>"D"},
{:id=>"5", :value=>"E"},
{:id=>"6", :value=>"F"},
{:id=>"7", :value=>"G"},
{:id=>"8", :value=>"H"},
]
I need to reject any hash in ah
whose id is not in array
.
What is the best way of achieving this?
Upvotes: 1
Views: 91
Reputation: 15967
I realize there is already an accepted answer but since all the answers here are in O(n*m)
, I thought I'd propose an alternative in O(n)*.
Here's a rough benchmark if the ah
array has 100_000
items and we have 10_000
items in the sub array. I'm including fl00r
's answer here and Cary's as we're all trying to avoid the O(n*m)
scenario.
user system total real
select with include 34.610000 0.110000 34.720000 ( 34.924679)
reject with include 34.320000 0.100000 34.420000 ( 34.611992)
group and select 0.170000 0.010000 0.180000 ( 0.182358)
select by value 0.040000 0.000000 0.040000 ( 0.041073)
select with set 0.040000 0.000000 0.040000 ( 0.048331)
hashify then values 0.130000 0.010000 0.140000 ( 0.139686)
The code to reproduce this:
require 'benchmark'
require 'set'
list_size = 100_000
sub_list_size = 10_000
ah = Array.new(list_size) { |i| { id: i, value: "A" } }
array = []
sub_list_size.times { array << (0..list_size).to_a.sample }
def group_than_select(ah, array)
grouped = ah.group_by { |x| x[:id] }
good_keys = grouped.keys - array
good_keys.map { |i| grouped[i] }.flatten
end
def select_by_fl00r(ah, array)
array_hash = array.each_with_object({}){ |i, h| h[i] = true }
ah.select{ |obj| array_hash[obj[:id]] }
end
def select_with_set(ah, array)
array_to_set = array.to_set
ah.select { |h| array_to_set.include?(h[:id]) }
end
def hashify_then_values_at(ah, array)
h = ah.each_with_object({}) { |g,h| h.update(g[:id]=>g) }
h.values_at(*(h.keys & array))
end
Benchmark.bm(25) do |x|
x.report("select with include") do
ah.select{|el| array.include?(el[:id])}
end
x.report("reject with include") do
ah.reject{|e| !array.include?(e[:id])}
end
x.report("group and select") do
group_than_select(ah, array)
end
x.report("select by value") do
select_by_fl00r(ah, array)
end
x.report("select with set") do
select_with_set(ah, array)
end
x.report("hashify then values") do
hashify_then_values_at(ah, array)
end
end
Upvotes: 4
Reputation: 110725
Here are two more possibilities.
array = ["1", "2", "3", "4", "99999999"]
#1
I expect the include?
solutions would be considerably faster if array
were first converted to a set:
require 'set'
def select_with_set(ah, array)
array_to_set = array.to_set
ah.select { |h| array_to_set.include?(h[:id]) }
end
select_with_set(ah, array)
#=> [{:id=>"1", :value=>"A"}, {:id=>"2", :value=>"B"},
# {:id=>"3", :value=>"C"}, {:id=>"4", :value=>"D"}]
#2
If, as in the example, the hash elements of ah
have distinct values for :id
, one could do this:
def hashify_then_values_at(ah, array)
h = ah.each_with_object({}) { |g,h| h.update(g[:id]=>g) }
h.values_at(*(h.keys & array))
end
hashify_then_values_at(ah, array)
#=> [{:id=>"1", :value=>"A"}, {:id=>"2", :value=>"B"},
# {:id=>"3", :value=>"C"}, {:id=>"4", :value=>"D"}]
Upvotes: 1
Reputation: 15666
You can select the inverse - the hashes whose id is in array
by using this code:
ah.select{|el| array.include?(el[:id])}
If you prefer reject
, you can use:
ah.reject{|el| !array.include?(el[:id])}
For more info: Array#reject, Array#select. These methods create a new array, if you want to modify in place use Array#reject! or Array#select!.
Upvotes: 5
Reputation: 83680
For big pieces of data I would go with some preprocessing to avoid O(n*m)
lookups.
array = ["1", "2", "3", "4"]
array_hash = array.each_with_object({}){ |i, h| h[i] = true }
ah.select{ |obj| array_hash[obj[:id]] }
Upvotes: 4
Reputation: 5721
A better solution than rejecting those ids that are not in the array is to only accept the ones that do:
ah.select { |hash| array.include?(hash[:id]) }
Upvotes: 2