Physicist
Physicist

Reputation: 3048

how to take average for each portion of an array

Say I have a 500000x1 array called A. I want to divide this array into 1000 equal sections, and then calculate the mean of that section. So I will end up with a 1000x1 array called B, in which B[1] is the mean of A[1:500], B[2] is the mean of B[501:1000]`, and so on. Since I will be doing this many many times, I want to do it efficiently. What's the most effective way of doing this in Matlab/Python?

Upvotes: 1

Views: 165

Answers (1)

Divakar
Divakar

Reputation: 221624

NumPy/Python

We could reshape to have 500 columns and then compute average along the second axis -

A.reshape(-1,500).mean(axis=1)

Sample run -

In [89]: A = np.arange(50)+1;

In [90]: A.reshape(-1,5).mean(1)
Out[90]: array([  3.,   8.,  13.,  18.,  23.,  28.,  33.,  38.,  43.,  48.])

Runtime test :

An alternative method to get those average values would be with the old-fashioned way of computing the sum and then dividing by the number of elements involved in the summation. Let's time these two methods -

In [107]: A = np.arange(500000)+1;

In [108]: %timeit A.reshape(-1,500).mean(1)
1000 loops, best of 3: 1.19 ms per loop

In [109]: %timeit A.reshape(-1,500).sum(1)/500.0
1000 loops, best of 3: 583 µs per loop

Seems, like quite an improvement there with the alternative method! But wait, it's because with mean method NumPy is converting to float type by default and that conversion overhead showed up here.

So, if we use float type input arrays, we would have a different and a fair scenario -

In [144]: A = np.arange(500000).astype(float)+1;

In [145]: %timeit A.reshape(-1,500).mean(1)
1000 loops, best of 3: 534 µs per loop

In [146]: %timeit A.reshape(-1,500).sum(1)/500.0
1000 loops, best of 3: 516 µs per loop

MATLAB

With column-major ordering, we would reshape to have 500 rows and then average along the first dimension -

mean(reshape(A,500,[]),1)

Sample run -

>> A = 1:50;                      
>> mean(reshape(A,5,[]),1)        
ans =
     3     8    13    18    23    28    33    38    43    48

Runtime test :

Let's try out the old-fashioned way here too -

>> A = 1:500000;
>> func1 = @() mean(reshape(A,500,[]),1);
>> timeit(func1)                         
ans =
    0.0013021
>> func2 = @() sum(reshape(A,500,[]),1)/500.0;
>> timeit(func2)                              
ans =
    0.0012291

Upvotes: 3

Related Questions