Sean
Sean

Reputation: 313

Ruby: Increasing Efficiency

I am dealing with a large quantity of data and I'm worried about the efficiency of my operations at-scale. After benchmarking, the average time to execute this string of code is about 0.004sec. The goal of this line of code is to find the difference between the two values in each array location. In a previous operation, 111.111 was loaded into the arrays in locations which contained invalid data. Due to some weird time domain issues, I needed to do this because I couldn't just remove the values and I needed some distinguishable placeholder. I could probably use 'nil' here instead. Anyways, back to the explanation. This line of code checks to ensure neither array has this 111.111 placeholder in the current location. If the values are valid then I perform the mathematical operation, otherwise I want to delete the values (or at least exclude them from the new array to which I'm writing). I accomplished this by place a 'nil' in that location and then compacting the array afterwards.

The time of 0.004sec for 4000 data points in each array isn't terrible but this line of code is executed 25M times. I'm hoping someone might be able to offer some insight into how I might optimize this line of code.

temp_row = row_1.zip(row_2).map do |x, y| 
  x == 111.111 || y == 111.111 ? nil : (x - y).abs 
end.compact

Upvotes: 1

Views: 102

Answers (2)

the Tin Man
the Tin Man

Reputation: 160551

You are wasting CPU generating nil in the ternary statement, then using compact to remove them. Instead, use reject or select to find elements not containing 111.111 then map or something similar.

Instead of:

row_1 = [1, 111.111, 2]
row_2 = [2, 111.111, 4]

temp_row = row_1.zip(row_2).map do |x, y| 
  x == 111.111 || y == 111.111 ? nil : (x - y).abs 
end.compact
temp_row # => [1, 2]

I'd start with:

temp_row = row_1.zip(row_2)
                .reject{ |x,y| x == 111.111 || y == 111.111 }
                .map{ |x,y| (x - y).abs }
temp_row # => [1, 2]

Or:

temp_row = row_1.zip(row_2)
                .each_with_object([]) { |(x,y), ary|
                  ary << (x - y).abs unless (x == 111.111 || y == 111.111)
                }
temp_row # => [1, 2]

Benchmarking different size arrays shows good things to know:

require 'benchmark'

DECIMAL_SHIFT = 100
DATA_ARRAY = (1 .. 1000).to_a
ROW_1 = (DATA_ARRAY + [111.111]).shuffle
ROW_2 = (DATA_ARRAY.map{ |i| i * 2 } + [111.111]).shuffle

Benchmark.bm(16) do |b|
  b.report('ternary:') do
    DECIMAL_SHIFT.times do
      ROW_1.zip(ROW_2).map do |x, y| 
        x == 111.111 || y == 111.111 ? nil : (x - y).abs 
      end.compact
    end
  end

  b.report('reject:') do
    DECIMAL_SHIFT.times do
      ROW_1.zip(ROW_2).reject{ |x,y| x == 111.111 || y == 111.111 }.map{ |x,y| (x - y).abs }
    end
  end

  b.report('each_with_index:') do
    DECIMAL_SHIFT.times do
      ROW_1.zip(ROW_2)
           .each_with_object([]) { |(x,y), ary|
             ary += [(x - y).abs] unless (x == 111.111 || y == 111.111)
           }
    end
  end
end

# >>                        user     system      total        real
# >> ternary:           0.240000   0.000000   0.240000 (  0.244476)
# >> reject:            0.060000   0.000000   0.060000 (  0.058842)
# >> each_with_index:   0.350000   0.000000   0.350000 (  0.349363)

Adjust the size of DECIMAL_SHIFT and DATA_ARRAY and the placement of 111.111 and see what happens to get an idea of what expressions work best for your data size and structure and fine-tune the code as necessary.

Upvotes: 2

Ruslan
Ruslan

Reputation: 2009

You can try the parallel gem https://github.com/grosser/parallel and run it on multiple threads

Upvotes: 0

Related Questions