Reputation: 155
Currently I face this question For example I have this array of hash
data = [
{:id => 1,:start_date => "2015-01-02",:end_date => "2015-01-05"},
{:id => 2,:start_date => "2015-01-06",:end_date => "2015-01-07"},
{:id => 3,:start_date => "2015-01-10",:end_date => "2015-01-20"}
]
So I want to find the exact hash that have "2015-01-04" in range of above hashes's start_date and end_date
Follow the document I find out there are 3 ways to do this
1) Use select
finding_hash = data.select {|h| h[:start_date] <= "2015-01-04" && h[:end_date] >= "2015-01-04"}
finding_hash will return an array of needed hash
But as i do this,i assure that there will always only one hash match the condition do after do this SELECT
i have to finding_hash.first
to get the hash i want
2)Use find
finding_hash = data.find{|h| h[:start_date] <= "2015-01-04" && h[:end_date] >= "2015-01-04"}
This way of doing ,the finding_hash IS the result hash i need
3)Traditional loop
data.each do |t|
if (t[:start_date] <= "2015-01-04" && t[:end_date] >= "2015-01-04")
return t
break
end
end
So which one is the fastest way to do this.I do need the performance because my data is quite big!
Thank you and sorry for my bad english!
Upvotes: 4
Views: 311
Reputation: 8821
You can test by benchmark
For example:
require 'benchmark'
n = 1000000
data = [
{:id => 1,:start_date => "2015-01-02",:end_date => "2015-01-05"},
{:id => 2,:start_date => "2015-01-06",:end_date => "2015-01-07"},
{:id => 3,:start_date => "2015-01-10",:end_date => "2015-01-20"}
]
Benchmark.bm do |x|
x.report { n.times do
data.select {|h| h[:start_date] <= "2015-01-04" && h[:end_date] >= "2015-01-04"}
end
}
x.report { n.times do
data.find{|h| h[:start_date] <= "2015-01-04" && h[:end_date] >= "2015-01-04"}
end
}
x.report {
n.times do
finding_hash = {}
data.each do |t|
if (t[:start_date] <= "2015-01-04" && t[:end_date] >= "2015-01-04")
finding_hash = t
break
end
end
end
}
end
output:
user system total real
1.490000 0.020000 1.510000 ( 1.533589)
1.070000 0.010000 1.080000 ( 1.096578)
1.000000 0.010000 1.010000 ( 1.011021)
Test results is related to the value of n and the data size.
Upvotes: 2
Reputation: 1596
All those variants are O(n) complexity.
If your ranges are not overlapping you can use bsearch
of array which is O(log n) complexity. You should sort your ranges at first.
sorted = data.sort_by { |x| x[:start_date] }
sorted.bsearch { |x| ..check if range of `x` includes value.. }
Upvotes: 1
Reputation: 24337
All the methods you've tried are Enumerable
methods, but the native Array
methods are faster. Try find_index
. Even after having to make a separate call to load the hash it's still about 20% faster than the next fastest:
index = data.find_index {|h| h[:start_date] <= "2015-01-04" && h[:end_date] >= "2015-01-04"}
x = data[index]
My benchmarks:
n = 1_000_000
data = [
{:id => 1,:start_date => "2015-01-02",:end_date => "2015-01-05"},
{:id => 2,:start_date => "2015-01-06",:end_date => "2015-01-07"},
{:id => 3,:start_date => "2015-01-10",:end_date => "2015-01-20"}
]
Benchmark.bm do |x|
x.report 'Enumerable#select' do
n.times do
data.select do |h|
h[:start_date] <= "2015-01-04" && h[:end_date] >= "2015-01-04"
end
end
end
x.report 'Enumerable#detect' do
n.times do
data.detect do |h|
h[:start_date] <= "2015-01-04" && h[:end_date] >= "2015-01-04"
end
end
end
x.report 'Enumerable#each ' do
n.times do
finding_hash = {}
data.each do |t|
if (t[:start_date] <= "2015-01-04" && t[:end_date] >= "2015-01-04")
finding_hash = t
break t
end
end
end
end
x.report 'Array#find_index ' do
n.times do
index = data.find_index {|h| h[:start_date] <= "2015-01-04" && h[:end_date] >= "2015-01-04"}
x = data[index]
end
end
end
Results are:
Enumerable#select 1.000000 0.010000 1.010000 ( 1.002282)
Enumerable#detect 0.790000 0.000000 0.790000 ( 0.797319)
Enumerable#each 0.620000 0.000000 0.620000 ( 0.627272)
Array#find_index 0.520000 0.000000 0.520000 ( 0.515691)
Upvotes: 2
Reputation: 45943
v3 is fastest:
def v1
@data.select {|h| h[:start_date] <= "2015-01-04" && h[:end_date] >= "2015-01-04"}
end
def v2
@data.find{|h| h[:start_date] <= "2015-01-04" && h[:end_date] >= "2015-01-04"}
end
def v3
@data.each do |t|
if (t[:start_date] <= "2015-01-04" && t[:end_date] >= "2015-01-04")
return t
break
end
end
end
select
will always be slowest because it has to iterate through the entire array. I'm not sure why find
is slower than v3. It may have to do with overhead.
However, find
and v3 may be the same for your data. The results below are not necessarily valid for your data.
t = Time.now; 10000.times{ v1 }; Time.now - t
=> 0.014131
t = Time.now; 10000.times{ v2 }; Time.now - t
=> 0.013138
t = Time.now; 10000.times{ v3 }; Time.now - t
=> 0.008799
Running this on the sample data is not the same thing as running it on your real data.
If the real data is too large, you can run it on a subset of the data to get a better answer.
BTW, you can rewrite v3 as:
data.each do |t|
break t if (t[:start_date] <= "2015-01-04" && t[:end_date] >= "2015-01-04")
end
FWIW, operating on an array is going to be very unwieldy and slow. You may want to save it in a database and run a query. For a large dataset, that would probably be at least 2 orders of magnitude faster.
Upvotes: 1