user4740054
user4740054

Reputation: 13

What is the fastest way to calculate a rolling average of an array with Ruby?

What is the fastest way to calculate a x second rolling average of an array in ruby?

I have two arrays of data from a bicycle ride. The time is when the corresponding speed value was read during the ride. You'll notice that the readings were not taken every second. For this reason I don't believe I can just increment the rolling array by one.

speed = [0, 15, 17, 19, 18, 22, 24, 28, 22, 17, 16, 14, 15, 15, 15, 0, 15, 19, 21, 25, 26, 24, 24] 
time = [0, 1, 2, 3, 5, 6, 7, 8, 10, 11, 12, 13, 15, 16, 17, 18, 20, 21, 22, 23, 25, 26, 27] 

I have tried something like the following (calculates a rolling 5 second average and puts it into an array), but it's pretty slow for large arrays and multiple intervals (takes 8 minutes to calculate all the intervals from a 1 hour bike ride, 1..3600):

duration = time.max

interval_average = []
time_hash = Hash[time.map.with_index.to_a] 

roll_start = 0
roll_stop = 5

for i in 1..(duration+1) do
    start = time_hash[roll_start]
    stop = time_hash[roll_stop]

    rolling_array = speed[start..stop]

    avg_value = mean(rolling_array)

    interval_average.push(avg_value)

    roll_start += 1
    roll_stop += 1
end

In my own code I'm ignoring the exceptions and pushing 0 instead, as I'm really just interested in finding the max of the x second averages in the end.

Upvotes: 1

Views: 955

Answers (1)

O-I
O-I

Reputation: 1575

I'm not sure about its performance, but here's another approach that you can test for finding the maximum of rolling averages over some fixed length of time.

speed = [0, 15, 17, 19, 18, 22, 24, 28, 22, 17, 16, 14, 15, 15, 15, 0, 15, 19, 21, 25, 26, 24, 24] 
time = [0, 1, 2, 3, 5, 6, 7, 8, 10, 11, 12, 13, 15, 16, 17, 18, 20, 21, 22, 23, 25, 26, 27] 

interval_length = 5 # seconds

speed.zip(time)                                                     # 1
     .each_cons(interval_length)                                    # 2
     .select { |i| i.last.last - i.first.last == interval_length}   # 3
     .map { |i| i.map(&:first).reduce(:+) / interval_length.to_f }  # 4
     .max                                                           # 5

Breaking it down into components with intermediate results:

1) Pair each speed reading with the time it was taken.

# => [[0, 0], [15, 1], [17, 2], [19, 3], [18, 5], [22, 6], [24, 7], [28, 8], [22, 10], [17, 11], [16, 12], [14, 13], [15, 15], [15, 16], [15, 17], [0, 18], [15, 20], [19, 21], [21, 22], [25, 23], [26, 25], [24, 26], [24, 27]]

2) Section off the above into consecutive runs of interval_length, in this case 5. This will give you an Enumerator object, but using to_a we can see the intermediate result looks like this:

# => [[15, 1], [17, 2], [19, 3], [18, 5], [22, 6]], [[17, 2], [19, 3], [18, 5], [22, 6], [24, 7]], [[19, 3], [18, 5], [22, 6], [24, 7], [28, 8]], [[18, 5], [22, 6], [24, 7], [28, 8], [22, 10]], [[22, 6], [24, 7], [28, 8], [22, 10], [17, 11]], [[24, 7], [28, 8], [22, 10], [17, 11], [16, 12]], [[28, 8], [22, 10], [17, 11], [16, 12], [14, 13]], [[22, 10], [17, 11], [16, 12], [14, 13], [15, 15]], [[17, 11], [16, 12], [14, 13], [15, 15], [15, 16]], [[16, 12], [14, 13], [15, 15], [15, 16], [15, 17]], [[14, 13], [15, 15], [15, 16], [15, 17], [0, 18]], [[15, 15], [15, 16], [15, 17], [0, 18], [15, 20]], [[15, 16], [15, 17], [0, 18], [15, 20], [19, 21]], [[15, 17], [0, 18], [15, 20], [19, 21], [21, 22]], [[0, 18], [15, 20], [19, 21], [21, 22], [25, 23]], [[15, 20], [19, 21], [21, 22], [25, 23], [26, 25]], [[19, 21], [21, 22], [25, 23], [26, 25], [24, 26]], [[21, 22], [25, 23], [26, 25], [24, 26], [24, 27]

3) Since you don't have information for every second, some of each chunk of speed values may be over time intervals that are not really interval_length seconds long. So, let's restrict our calculations only to those. For 5 seconds, it happens that no data needs to be dropped and the intermediate result is the same as step 2.

4) Now we can take the rolling average:

 # => [13.8, 18.2, 20.0, 22.2, 22.8, 22.6, 21.4, 19.4, 16.8, 15.4, 15.0, 11.8, 12.0, 12.8, 14.0, 16.0, 21.2, 23.0, 24.0]

5) And the maximum thereof:

# => 24.0

Again, I'm not sure how this will fare on a really large data set, but it might be worth a try.

Upvotes: 2

Related Questions