schoolie
schoolie

Reputation: 462

Numpy - Sorting two ndarrays by single axis of first array

I'm trying to sort two large four dimensional arrays in numpy.

I want to sort based on the values axis 2 of the first array, and sort the second array by the same indices. All other axes should remain in the same order for both arrays.

The following code does what I want, but relies on looping in python, so it's slow. The arrays are quite large, so I'd really like to get this working using compiled numpy operations for performance reasons. Or some other means of getting this block of code to be compiled (Cython?).

import numpy as np

data = np.random.rand(10,6,4,1)
data2 = np.random.rand(10,6,4,3)

print data[0,0,:,:]
print data2[0,0,:,:]

for n in range(data.shape[0]):
  for m in range(data.shape[1]):

    sort_ids = np.argsort(data[n,m,:,0])

    data[n,m,:,:] = data[n,m,sort_ids,:]
    data2[n,m,:,:] = data2[n,m,sort_ids,:]


print data[0,0,:,:]
print data2[0,0,:,:]

Upvotes: 1

Views: 202

Answers (2)

schoolie
schoolie

Reputation: 462

Found a way to make this work. It requires storing an index array, which may cause some memory issues for me, but it's way faster. Example code with timing comparison:

import numpy as np
import time

loops = 1000

data = np.random.rand(100,6,4,1)
data2 = np.random.rand(100,6,4,3)

start = time.time()
for n in range(loops):


  idxs = np.indices(data.shape)
  idxs2 = np.indices(data2.shape)

  sort_ids = np.argsort(data, 2)

  sorted_data = data[idxs[0], idxs[1], sort_ids, idxs[3]]
  sorted_data2 = data2[idxs2[0], idxs2[1], np.repeat(sort_ids, data2.shape[3], 3), idxs2[3]]

print 'Time Elapsed: %5.2f seconds' % (time.time() - start)



start = time.time()
for n in range(loops):

  sorted_data = np.zeros(data.shape)
  sorted_data2 = np.zeros(data2.shape)

  for n in range(data.shape[0]):
    for m in range(data.shape[1]):

      sort_ids = np.argsort(data[n,m,:,0])

      data[n,m,:,:] = data[n,m,sort_ids,:]
      data2[n,m,:,:] = data2[n,m,sort_ids,:]


print 'Time Elapsed: %5.2f seconds' % (time.time() - start)

Upvotes: 0

plonser
plonser

Reputation: 3363

Maybe there is a better solution but this should work:

sort_ids = np.argsort(data,axis=2)

s1 = data.shape
s2 = data2.shape
d1 = data[np.arange(s1[0])[:,None,None,None],np.arange(s1[1])[None,:,None,None],sort_ids,np.arange(s1[3])[None,None,None,:]]
d2 = data2[np.arange(s2[0])[:,None,None,None],np.arange(s2[1])[None,:,None,None],sort_ids,np.arange(s2[3])[None,None,None,:]]

At least the output is identical to your code.

Upvotes: 1

Related Questions