Reputation: 263
I have a large numpy array. Is there a way to subtract each element with the elements below it, and store the result in a new list/array, without using a loop.
A simple example of what I mean:
a = numpy.array([4,3,2,1])
result = [4-3, 4-2, 4-1, 3-2, 3-1, 2-1] = [1, 2, 3, 1, 2 ,1]
Note that the 'real' array I am working with doesn't contain numbers in sequence. This is just to make the example simple.
I know the result should have (n-1)! elements, where n is the size of the array.
Is there a way to do this without using a loop, but by repeating the array in a 'smart' way?
Thanks!
Upvotes: 1
Views: 2120
Reputation: 280261
temp = a[:, None] - a
result = temp[np.triu_indices(len(a), k=1)]
Perform all pairwise subtractions to produce temp
, including subtracting elements from themselves and subtracting earlier elements from later elements, then use triu_indices
to select the results we want. (a[:, None]
adds an extra length-1 axis to a
.)
Note that almost all of the runtime is spent constructing result
from temp
(because triu_indices
is slow and using indices to select the upper triangle of an array is slow). If you can use temp
directly, you can save a lot of time:
In [13]: a = numpy.arange(2000)
In [14]: %%timeit
....: temp = a[:, None] - a
....:
100 loops, best of 3: 6.99 ms per loop
In [15]: %%timeit
....: temp = a[:, None] - a
....: result = temp[numpy.triu_indices(len(a), k=1)]
....:
10 loops, best of 3: 51.7 ms per loop
Upvotes: 3
Reputation: 221514
Here's a masking
based approach for the extraction after broadcasted subtractions and for the mask creation we are again making use of broadcasting
(double broadcasting
powered so to speak) -
r = np.arange(a.size)
out = (a[:, None] - a)[r[:,None] < r]
Runtime test
Vectorized approaches -
# @user2357112's solution
def pairwise_diff_triu_indices_based(a):
return (a[:, None] - a)[np.triu_indices(len(a), k=1)]
# Proposed in this post
def pairwise_diff_masking_based(a):
r = np.arange(a.size)
return (a[:, None] - a)[r[:,None] < r]
Timings -
In [109]: a = np.arange(2000)
In [110]: %timeit pairwise_diff_triu_indices_based(a)
10 loops, best of 3: 36.1 ms per loop
In [111]: %timeit pairwise_diff_masking_based(a)
100 loops, best of 3: 11.8 ms per loop
Closer look at involved performance parameters
Let's dig deep a bit through the timings on this setup to study how much mask based approach helps. Now, for comparison there are two parts - Mask creation vs. indices creation and Mask based boolean indexing vs. integer based indexing.
How much mask creation helps?
In [37]: r = np.arange(a.size)
In [38]: %timeit np.arange(a.size)
1000000 loops, best of 3: 1.88 µs per loop
In [39]: %timeit r[:,None] < r
100 loops, best of 3: 3 ms per loop
In [40]: %timeit np.triu_indices(len(a), k=1)
100 loops, best of 3: 14.7 ms per loop
About 5x
improvement on mask creation over index setup.
How much boolean indexing helps against integer based indexing?
In [41]: mask = r[:,None] < r
In [42]: idx = np.triu_indices(len(a), k=1)
In [43]: subs = a[:, None] - a
In [44]: %timeit subs[mask]
100 loops, best of 3: 4.15 ms per loop
In [45]: %timeit subs[idx]
100 loops, best of 3: 10.9 ms per loop
About 2.5x
improvement here.
Upvotes: 2
Reputation: 211942
a = [4, 3, 2, 1]
differences = ((x - y) for i, x in enumerate(a) for y in a[i+1:])
for diff in differences:
# do something with difference.
pass
Upvotes: 1
Reputation: 5729
Check out itertools.combinations
:
from itertools import combinations
l = [4, 3, 2, 1]
result = []
for n1, n2 in combinations(l, 2):
result.append(n1 - n2)
print result
Results in:
[1, 2, 3, 1, 2, 1]
combinations
returns a generator, so this is good for very large lists :)
Upvotes: 0