Thomas Leonard
Thomas Leonard

Reputation: 1047

Iterate on an array with two implicit loops

Is it possible to iterate implicitly on an array with two indices? Here is a very simple example of what I would like to do :

import numpy as np

x = np.arange(3)
y = np.zeros(3)

for i in range(3):
    y[i] = np.sum(x - x[i])

There is an implicit loop (the sum) and an explicit one (for i in range(3))... Is it possible to have a totally implicit version of that?

Upvotes: 3

Views: 1597

Answers (2)

Jaime
Jaime

Reputation: 67437

When possible, you should always try to use mathematics before computer science. Your expression, y[i] = np.sum(x - x[i]) can be rewritten with a little algebra as y[i] = np.sum(x) - x.size * x[i]. This makes it very clear that you can rewrite your code without any loops as:

y = np.sum(x) - x.size * x

As should be obvious, for large arrays it runs much faster than @JoshAdel's solution, x400 faster for an input of size 1000:

>>> x = np.random.normal(size=(1000,))
>>> np.allclose(np.sum(x - x[:,None], 1), np.sum(x) - x.size * x)
True

%timeit np.sum(x - x[:,None], 1)
100 loops, best of 3: 6.33 ms per loop

%timeit np.sum(x) - x.size * x
100000 loops, best of 3: 16.5 us per loop

Upvotes: 4

JoshAdel
JoshAdel

Reputation: 68682

The following should work:

y = np.sum(x - x[:,None], axis=1)

This solution is using numpy's broadcasting facility. First I'm recasting x which has a shape of (N,) to (N,1) using x[:,None]. You might also see this written as x[:,np.newaxis].

x - x[:,None] creates an (N,N) array whose elements are tmp_{i,j} = x_i - x_j. I then just sum across the rows using the argument axis=1 in np.sum.

See:

In [13]: y = np.zeros(10)

In [14]: x = np.random.normal(size=(10,))

In [15]: for i in range(10):
        y[i] = np.sum(x - x[i])
   ....:

In [16]: y
Out[16]:
array([  7.99781458,   4.15114434, -17.24655912, -20.35606168,
        -5.0211756 ,   7.52062868,   8.2501526 ,   3.90397351,
        10.18746451,   0.61261819])

In [17]: np.sum(x - x[:,None], 1)
Out[17]:
array([  7.99781458,   4.15114434, -17.24655912, -20.35606168,
        -5.0211756 ,   7.52062868,   8.2501526 ,   3.90397351,
        10.18746451,   0.61261819])

In [18]: np.allclose(y, np.sum(x - x[:,None], 1))
Out[18]: True

Timings: Just to point out that using the facilities that numpy provides to operate on arrays is often significantly faster than using standard Python constructs:

In [48]: x = np.random.normal(size=(100,))

In [49]: %timeit y = np.array([sum(x - k) for k in x])
100 loops, best of 3: 6.86 ms per loop

In [67]: %timeit y = np.array([np.sum(x - k) for k in x])
1000 loops, best of 3: 1.54 ms per loop

In [50]: %timeit np.sum(x - x[:,None], 1)
10000 loops, best of 3: 59 µs per loop

In [51]:

In [51]: x = np.random.normal(size=(1000,))

In [52]: %timeit y = np.array([sum(x - k) for k in x])
1 loops, best of 3: 592 ms per loop

In [72]: %timeit y = np.array([np.sum(x - k) for k in x])
100 loops, best of 3: 17.2 ms per loop

In [53]: %timeit np.sum(x - x[:,None], 1)
100 loops, best of 3: 8.67 ms per loop

Upvotes: 3

Related Questions