Reputation: 187
Which is a faster method? like aren't they both the same?
start = time.time()
arr = np.array([1,2,3,4,5,6,7,8,9,0,12])
total_price = np.sum(arr[arr < 7])* 2.14
print(total_price)
print('Duration: {} seconds'.format(time.time() - start))
start = time.time()
arr = np.array([1,2,3,4,5,6,7,8,9,0,12])
total_price = (arr[arr<7]).sum()* 2.14
print(total_price)
print('Duration: {} seconds'.format(time.time() - start))
On running the code, again and again, both of them give differing resultant execution time. Sometimes the former method is faster and sometimes later.
Upvotes: 0
Views: 1481
Reputation: 231738
Removing the giant docstring, the code for np.sum
is
@array_function_dispatch(_sum_dispatcher)
def sum(a, axis=None, dtype=None, out=None, keepdims=np._NoValue,
initial=np._NoValue, where=np._NoValue):
if isinstance(a, _gentype):
# 2018-02-25, 1.15.0
warnings.warn(
"Calling np.sum(generator) is deprecated, and in the future will give a different result. "
"Use np.sum(np.fromiter(generator)) or the python sum builtin instead.",
DeprecationWarning, stacklevel=3)
res = _sum_(a)
if out is not None:
out[...] = res
return out
return res
return _wrapreduction(a, np.add, 'sum', axis, dtype, out, keepdims=keepdims,
initial=initial, where=where)
array_function_dispatch
handles __array_function__
overrides that non-NumPy types might provide, while _wrapreduction
is responsible for making sure np._NoValue
isn't passed to the underlying implementation, as well as deciding whether to call the sum
method (for non-array input) or add.reduce
(for array input).
So it does a bunch of checks to handle non-array inputs, then eventually passes the task to np.add.reduce
if the input is an array.
Meanwhile, np.ndarray.sum
is this:
static PyObject *
array_sum(PyArrayObject *self, PyObject *args, PyObject *kwds)
{
NPY_FORWARD_NDARRAY_METHOD("_sum");
}
where NPY_FORWARD_NDARRAY_METHOD
is a macro that forwards the operation to numpy.core._methods._sum
:
def _sum(a, axis=None, dtype=None, out=None, keepdims=False,
initial=_NoValue, where=True):
return umr_sum(a, axis, dtype, out, keepdims, initial, where)
and umr_sum
is an alias for np.add.reduce
.
Both code paths eventually end up at np.add.reduce
, but the ndarray.sum
code path doesn't involve all the pre-check work for non-array input, because the array already knows it's an array.
In these tests the calculation time itself is small enough that the extensive prechecks make a big difference:
In [607]: timeit np.sum(np.arange(1000))
15.4 µs ± 42.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [608]: timeit np.arange(1000).sum()
12.2 µs ± 29.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [609]: timeit np.add.reduce(np.arange(1000))
9.19 µs ± 17.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
numpy
has a number of function/method pairs like this. Use which ever is most convenient - and looks prettiest in your code!
Upvotes: 5