belas
belas

Reputation: 287

How to unsort a np array given the argsort

I intially have an unsorted np array of arrays

test = np.array([['A', 'A', 'B', 'E', 'A'],
       ['B', 'E', 'A', 'E', 'B'],
       ['C', 'D', 'D', 'A', 'C'],
       ['B', 'D', 'A', 'C', 'A'],
       ['B', 'A', 'E', 'A', 'E'],
       ['C', 'D', 'C', 'E', 'D']])

To sort the array based on the first column:

argsortTest = test[:,0].argsort()
test_sorted = test[argsortTest]

  test_sorted:  
[['A' 'A' 'B' 'E' 'A']
 ['B' 'E' 'A' 'E' 'B']
 ['B' 'D' 'A' 'C' 'A']
 ['B' 'A' 'E' 'A' 'E']
 ['C' 'D' 'D' 'A' 'C']
 ['C' 'D' 'C' 'E' 'D']]

I make some processing over test_sorted array changing some values (the first column remains intact and the number of rows unchanged). In the end I want to retrieve the orginal array of arrays keeping the changed values. So I need to 'unsort' it again based on first column

My solution so far:

argsortTestList = argsortTest.tolist()
rangeX = np.array(xrange(6))
unsort_args = [argsortTestList.index(x) for x in rangeX]
unsorted = test_sorted[unsort_args]

The reason I do sort and then unsort in the end is because I got better performance when working on the sorted array. However as the changes done are not reflected in the original array, I have to unsort it again.

However the 'unsorting' solution is too slow (large dataset: around 200K rows)

Upvotes: 4

Views: 1736

Answers (1)

plonser
plonser

Reputation: 3363

Just do

b = np.argsort(argsortTest)
test_sorted[b]

# Output
# array([['A', 'A', 'B', 'E', 'A'],
#        ['B', 'E', 'A', 'E', 'B'],
#        ['C', 'D', 'D', 'A', 'C'],
#        ['B', 'D', 'A', 'C', 'A'],
#        ['B', 'A', 'E', 'A', 'E'],
#        ['C', 'D', 'C', 'E', 'D']], 
#       dtype='|S1')

Explanation

Consider the following array

comb = np.column_stack((np.arange(argsortTest.size),argsortTest))
comb

# array([[0, 0],
#        [1, 1],
#        [2, 3],
#        [3, 4],
#        [4, 2],
#        [5, 5]])

The left column are the indices of test and the right the result of argsort, this means the index 0 goes to 0, 1 to 1, 2 to 3, ... Since the first column is sorted we can simply use advanced indexing in order to get the ordered array test[argsortTest].

Now, you want to do the inverse, i.e. go from the right column to the left such that the index 0 goes to 0, ... 4 to 3, 2 to 4, ... In order for advanced indexing to work for the left column the right column must now be sorted

comb[np.argsort(comb[:,1])]

# array([[0, 0],
#        [1, 1],
#        [4, 2],
#        [2, 3],
#        [3, 4],
#        [5, 5]])

But since this left column is exactly the argsort of the right column in comb we find that

test = test_sorted[ np.argsort(argsortTest) ]

I hope this helps in order to understand the idea ...

Upvotes: 6

Related Questions