Reputation: 313
I am dealing with a large quantity of data and I'm worried about the efficiency of my operations at-scale. After benchmarking, the average time to execute this string of code is about 0.004sec. The goal of this line of code is to find the difference between the two values in each array location. In a previous operation, 111.111 was loaded into the arrays in locations which contained invalid data. Due to some weird time domain issues, I needed to do this because I couldn't just remove the values and I needed some distinguishable placeholder. I could probably use 'nil' here instead. Anyways, back to the explanation. This line of code checks to ensure neither array has this 111.111 placeholder in the current location. If the values are valid then I perform the mathematical operation, otherwise I want to delete the values (or at least exclude them from the new array to which I'm writing). I accomplished this by place a 'nil' in that location and then compacting the array afterwards.
The time of 0.004sec for 4000 data points in each array isn't terrible but this line of code is executed 25M times. I'm hoping someone might be able to offer some insight into how I might optimize this line of code.
temp_row = row_1.zip(row_2).map do |x, y|
x == 111.111 || y == 111.111 ? nil : (x - y).abs
end.compact
Upvotes: 1
Views: 102
Reputation: 160551
You are wasting CPU generating nil
in the ternary statement, then using compact
to remove them. Instead, use reject
or select
to find elements not containing 111.111
then map
or something similar.
Instead of:
row_1 = [1, 111.111, 2]
row_2 = [2, 111.111, 4]
temp_row = row_1.zip(row_2).map do |x, y|
x == 111.111 || y == 111.111 ? nil : (x - y).abs
end.compact
temp_row # => [1, 2]
I'd start with:
temp_row = row_1.zip(row_2)
.reject{ |x,y| x == 111.111 || y == 111.111 }
.map{ |x,y| (x - y).abs }
temp_row # => [1, 2]
Or:
temp_row = row_1.zip(row_2)
.each_with_object([]) { |(x,y), ary|
ary << (x - y).abs unless (x == 111.111 || y == 111.111)
}
temp_row # => [1, 2]
Benchmarking different size arrays shows good things to know:
require 'benchmark'
DECIMAL_SHIFT = 100
DATA_ARRAY = (1 .. 1000).to_a
ROW_1 = (DATA_ARRAY + [111.111]).shuffle
ROW_2 = (DATA_ARRAY.map{ |i| i * 2 } + [111.111]).shuffle
Benchmark.bm(16) do |b|
b.report('ternary:') do
DECIMAL_SHIFT.times do
ROW_1.zip(ROW_2).map do |x, y|
x == 111.111 || y == 111.111 ? nil : (x - y).abs
end.compact
end
end
b.report('reject:') do
DECIMAL_SHIFT.times do
ROW_1.zip(ROW_2).reject{ |x,y| x == 111.111 || y == 111.111 }.map{ |x,y| (x - y).abs }
end
end
b.report('each_with_index:') do
DECIMAL_SHIFT.times do
ROW_1.zip(ROW_2)
.each_with_object([]) { |(x,y), ary|
ary += [(x - y).abs] unless (x == 111.111 || y == 111.111)
}
end
end
end
# >> user system total real
# >> ternary: 0.240000 0.000000 0.240000 ( 0.244476)
# >> reject: 0.060000 0.000000 0.060000 ( 0.058842)
# >> each_with_index: 0.350000 0.000000 0.350000 ( 0.349363)
Adjust the size of DECIMAL_SHIFT
and DATA_ARRAY
and the placement of 111.111
and see what happens to get an idea of what expressions work best for your data size and structure and fine-tune the code as necessary.
Upvotes: 2
Reputation: 2009
You can try the parallel
gem https://github.com/grosser/parallel and run it on multiple threads
Upvotes: 0