Reputation: 287
I intially have an unsorted np array of arrays
test = np.array([['A', 'A', 'B', 'E', 'A'],
['B', 'E', 'A', 'E', 'B'],
['C', 'D', 'D', 'A', 'C'],
['B', 'D', 'A', 'C', 'A'],
['B', 'A', 'E', 'A', 'E'],
['C', 'D', 'C', 'E', 'D']])
To sort the array based on the first column:
argsortTest = test[:,0].argsort()
test_sorted = test[argsortTest]
test_sorted:
[['A' 'A' 'B' 'E' 'A']
['B' 'E' 'A' 'E' 'B']
['B' 'D' 'A' 'C' 'A']
['B' 'A' 'E' 'A' 'E']
['C' 'D' 'D' 'A' 'C']
['C' 'D' 'C' 'E' 'D']]
I make some processing over test_sorted
array changing some values (the first column remains intact and the number of rows unchanged). In the end I want to retrieve the orginal array of arrays keeping the changed values. So I need to 'unsort' it again based on first column
My solution so far:
argsortTestList = argsortTest.tolist()
rangeX = np.array(xrange(6))
unsort_args = [argsortTestList.index(x) for x in rangeX]
unsorted = test_sorted[unsort_args]
The reason I do sort and then unsort in the end is because I got better performance when working on the sorted array. However as the changes done are not reflected in the original array, I have to unsort it again.
However the 'unsorting' solution is too slow (large dataset: around 200K rows)
Upvotes: 4
Views: 1736
Reputation: 3363
Just do
b = np.argsort(argsortTest)
test_sorted[b]
# Output
# array([['A', 'A', 'B', 'E', 'A'],
# ['B', 'E', 'A', 'E', 'B'],
# ['C', 'D', 'D', 'A', 'C'],
# ['B', 'D', 'A', 'C', 'A'],
# ['B', 'A', 'E', 'A', 'E'],
# ['C', 'D', 'C', 'E', 'D']],
# dtype='|S1')
Explanation
Consider the following array
comb = np.column_stack((np.arange(argsortTest.size),argsortTest))
comb
# array([[0, 0],
# [1, 1],
# [2, 3],
# [3, 4],
# [4, 2],
# [5, 5]])
The left column are the indices of test
and the right the result of argsort
, this means the index 0
goes to 0
, 1
to 1
, 2
to 3
, ... Since the first column is sorted we can simply use advanced indexing in order to get the ordered array test[argsortTest]
.
Now, you want to do the inverse, i.e. go from the right column to the left such that the index 0
goes to 0
, ... 4
to 3
, 2
to 4
, ...
In order for advanced indexing to work for the left column the right column must now be sorted
comb[np.argsort(comb[:,1])]
# array([[0, 0],
# [1, 1],
# [4, 2],
# [2, 3],
# [3, 4],
# [5, 5]])
But since this left column is exactly the argsort
of the right column in comb
we find that
test = test_sorted[ np.argsort(argsortTest) ]
I hope this helps in order to understand the idea ...
Upvotes: 6