Reputation: 462
I'm trying to sort two large four dimensional arrays in numpy.
I want to sort based on the values axis 2 of the first array, and sort the second array by the same indices. All other axes should remain in the same order for both arrays.
The following code does what I want, but relies on looping in python, so it's slow. The arrays are quite large, so I'd really like to get this working using compiled numpy operations for performance reasons. Or some other means of getting this block of code to be compiled (Cython?).
import numpy as np
data = np.random.rand(10,6,4,1)
data2 = np.random.rand(10,6,4,3)
print data[0,0,:,:]
print data2[0,0,:,:]
for n in range(data.shape[0]):
for m in range(data.shape[1]):
sort_ids = np.argsort(data[n,m,:,0])
data[n,m,:,:] = data[n,m,sort_ids,:]
data2[n,m,:,:] = data2[n,m,sort_ids,:]
print data[0,0,:,:]
print data2[0,0,:,:]
Upvotes: 1
Views: 202
Reputation: 462
Found a way to make this work. It requires storing an index array, which may cause some memory issues for me, but it's way faster. Example code with timing comparison:
import numpy as np
import time
loops = 1000
data = np.random.rand(100,6,4,1)
data2 = np.random.rand(100,6,4,3)
start = time.time()
for n in range(loops):
idxs = np.indices(data.shape)
idxs2 = np.indices(data2.shape)
sort_ids = np.argsort(data, 2)
sorted_data = data[idxs[0], idxs[1], sort_ids, idxs[3]]
sorted_data2 = data2[idxs2[0], idxs2[1], np.repeat(sort_ids, data2.shape[3], 3), idxs2[3]]
print 'Time Elapsed: %5.2f seconds' % (time.time() - start)
start = time.time()
for n in range(loops):
sorted_data = np.zeros(data.shape)
sorted_data2 = np.zeros(data2.shape)
for n in range(data.shape[0]):
for m in range(data.shape[1]):
sort_ids = np.argsort(data[n,m,:,0])
data[n,m,:,:] = data[n,m,sort_ids,:]
data2[n,m,:,:] = data2[n,m,sort_ids,:]
print 'Time Elapsed: %5.2f seconds' % (time.time() - start)
Upvotes: 0
Reputation: 3363
Maybe there is a better solution but this should work:
sort_ids = np.argsort(data,axis=2)
s1 = data.shape
s2 = data2.shape
d1 = data[np.arange(s1[0])[:,None,None,None],np.arange(s1[1])[None,:,None,None],sort_ids,np.arange(s1[3])[None,None,None,:]]
d2 = data2[np.arange(s2[0])[:,None,None,None],np.arange(s2[1])[None,:,None,None],sort_ids,np.arange(s2[3])[None,None,None,:]]
At least the output is identical to your code.
Upvotes: 1