Reputation: 35
I have an array containing millions of entries. I would like to calculate a another vector, containing all of the distances, for pairs of entries, that are shifted by a certain number delta in the array.
Actually I'm using this:
for i in range(0, len(a) - delta):
difs = numpy.append(difs, a[i + self.delta] - a[i])
Does anyone know how to do this faster?
There's a similar question here: Fastest pairwise distance metric in python
But I don't want to calculate the distance for every pair.
Example:
>>> a = [1,5,7,7,2,6]
>>> delta = 2
>>> print difs
array([ 6., 2., -5., -1.])
Upvotes: 1
Views: 276
Reputation: 176850
You could just slice a
using delta
and then subtract the two subarrays:
>>> a = np.array([1,5,7,7,2,6])
>>> delta = 2
>>> a[delta:] - a[:-delta]
array([ 6, 2, -5, -1])
This slicing operation is likely to be very quick for large arrays as no additional indexes or copies of the data in a
needs to be created. The subtraction creates a new array with the required values in.
Upvotes: 2
Reputation: 4311
Assuming a
is a numpy.array, one could probably get the same result with indexing all pairs at once. This is a vectorized numpy solution.
a = numpy.atleast_1d(a) #// make sure a is a numpy array
idx_minuend = range(delta, len(a))
idx_subtrahend = range(0, len(a)-delta)
difs = a[idx_minuend] - a[idx_subtrahend]
A little tests verifies that the results are the same:
# // a little test with your data
import numpy
a = [1,5,7,7,2,6]
delta = 2
# // current version
difs = numpy.array([])
for i in range(0, len(a) - delta):
difs = numpy.append(difs, a[i + delta] - a[i])
# // numpy vectorized version
a = numpy.atleast_1d(a) #// make sure a is a numpy array
idx_minuend = range(delta, len(a))
idx_subtrahend = range(0, len(a)-delta)
difs2 = a[idx_minuend] - a[idx_subtrahend]
# // compare results
(difs == difs2).all() # True
Upvotes: 0