mumbala
mumbala

Reputation: 121

Why is numpy ma.average 24 times slower than arr.mean?

I've found something interesting in Python's numpy. ma.average is a lot slower than arr.mean (arr is an array)

>>> arr = np.full((3, 3), -9999, dtype=float)
array([[-9999., -9999., -9999.],
       [-9999., -9999., -9999.],
       [-9999., -9999., -9999.]])

%timeit np.ma.average(arr, axis=0)
The slowest run took 49.32 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 191 µs per loop

%timeit arr.mean(axis=0)
The slowest run took 6.63 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 7.41 µs per loop

with random numbers

arr = np.random.random((3,3))

%timeit arr.mean(axis=0)
The slowest run took 6.17 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 7.78 µs per loop

%timeit np.ma.average(arr, axis=0)
1000 loops, best of 3: 186 µs per loop

--> That's nearly 24 times slower.

Documentation

numpy.ma.average(a, axis=None, weights=None, returned=False)

Return the weighted average of array over the given axis.

numpy.mean(a, axis=None, dtype=None, out=None, keepdims)

Compute the arithmetic mean along the specified axis.


Why is ma.average so much slower than arr.mean? Mathematically they are the same (correct me if I'm wrong). My guess is that it has something to do with the weighted options on ma.average but shouldn't be there a fallback if no weights are passed?

Upvotes: 2

Views: 1739

Answers (2)

mumbala
mumbala

Reputation: 121

Thanks to @WillemVanOnsem and @sascha in the comments above

edit: applies to small arrays, see accepted answer for more information

  • Masked operations are slow try, to avoid it:

    mask = self.local_pos_history[:, 0] > -9
    local_pos_hist_masked = self.local_pos_history[mask]
    avg = local_pos_hist_masked.mean(axis=0)
    

    old with masked

    mask = np.ma.masked_where(self.local_pos_history > -9, self.local_pos_history)
    local_pos_hist_mask = self.local_pos_history[mask].reshape(len(self.local_pos_history) // 3, 3)
    avg_pos = self.local_pos_history
    
  • np.average is nearly equal to arr.mean:

    %timeit np.average(arr, axis=0)
    The slowest run took 5.81 times longer than the fastest. This could mean that an intermediate result is being cached.
    100000 loops, best of 3: 9.89 µs per loop
    
    %timeit np.mean(arr, axis=0)
    The slowest run took 6.44 times longer than the fastest. This could mean that an intermediate result is being cached.
    100000 loops, best of 3: 8.74 µs per loop
    

just for clarification still a tests on small batch

Upvotes: 0

MSeifert
MSeifert

Reputation: 152765

A good way to find out why something is slower is to profile it. I'll use the 3rd party library line_profiler and the IPython command %lprun (see for example this blog) here:

%load_ext line_profiler

import numpy as np
arr = np.full((3, 3), -9999, dtype=float)

%lprun -f np.ma.average np.ma.average(arr, axis=0)

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   519                                           def average(a, axis=None, weights=None, returned=False):
   ...
   570         1         1810   1810.0     30.5      a = asarray(a)
   571         1           15     15.0      0.3      m = getmask(a)
   572                                           
   573                                               # inspired by 'average' in numpy/lib/function_base.py
   574                                           
   575         1            5      5.0      0.1      if weights is None:
   576         1         3500   3500.0     59.0          avg = a.mean(axis)
   577         1          591    591.0     10.0          scl = avg.dtype.type(a.count(axis))
   578                                               else: 
   ...
   608                                           
   609         1            7      7.0      0.1      if returned:
   610                                                   if scl.shape != avg.shape:
   611                                                       scl = np.broadcast_to(scl, avg.shape).copy()
   612                                                   return avg, scl
   613                                               else:
   614         1            5      5.0      0.1          return avg

I removed some irrelevant lines.

So actually 30% of the time is spent in np.ma.asarray (something that arr.mean doesn't have to do!).

However the relative times change drastically if you use a bigger array:

arr = np.full((1000, 1000), -9999, dtype=float)

%lprun -f np.ma.average np.ma.average(arr, axis=0)
Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   519                                           def average(a, axis=None, weights=None, returned=False):
   ...
   570         1          609    609.0      7.6      a = asarray(a)
   571         1           14     14.0      0.2      m = getmask(a)
   572                                           
   573                                               # inspired by 'average' in numpy/lib/function_base.py
   574                                           
   575         1            7      7.0      0.1      if weights is None:
   576         1         6924   6924.0     86.9          avg = a.mean(axis)
   577         1          404    404.0      5.1          scl = avg.dtype.type(a.count(axis))
   578                                               else:
   ...
   609         1            6      6.0      0.1      if returned:
   610                                                   if scl.shape != avg.shape:
   611                                                       scl = np.broadcast_to(scl, avg.shape).copy()
   612                                                   return avg, scl
   613                                               else:
   614         1            6      6.0      0.1          return avg

This time the np.ma.MaskedArray.mean function almost takes up 90% of the time.

Note: You could also dig deeper and look into np.ma.asarray or np.ma.MaskedArray.count or np.ma.MaskedArray.mean and check their line profilings. But I just wanted to show that there are lots of called function that add to the overhead.

So the next question is: did the relative times between np.ndarray.mean and np.ma.average also change? And at least on my computer the difference is much lower now:

%timeit np.ma.average(arr, axis=0)
# 2.96 ms ± 91 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit arr.mean(axis=0)
# 1.84 ms ± 23.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

This time it's not even 2 times slower. I assume for even bigger arrays the difference will get even smaller.


This is also something that is actually quite common with NumPy:

The constant factors are quite high even for plain numpy functions (see for example my answer to the question "Performance in different vectorization method in numpy"). For np.ma these constant factors are even bigger, especially if you don't use a np.ma.MaskedArray as input. But even though the constant factors might be high, these functions excel with big arrays.

Upvotes: 3

Related Questions