Reputation:
I've got two numpy arrays lst
and a
and I want to sort the first column of a
in the same order it appears in lst
, while maintaining each row of a
to have the same elements.
lst = np.array(['a','b','d','e','c'])
a = np.array([['e','b','a','d','c'],[1,2,3,4,5]]).T
My desired outcome is:
array([['a', '3'],
['b', '2'],
['d', '4'],
['e', '1'],
['c', '5']])
I can get this outcome via pandas:
(pd.DataFrame(lst, columns=['col'])
.merge(pd.DataFrame(a, columns=['col','data']), on='col')
.to_numpy())
but I was wondering if it's possible to do it only using numpy.
Upvotes: 2
Views: 2683
Reputation: 1879
The following code works fine if (like in the example you propose) the two arrays have the same number of elements, and they are both the same up to order. Then it's just a matter of indexing, which can be done by sorting a few times (arg-sorting both arrays to get the corresponding indices to sort, then arg-sorting again lst
to get the inverse indexing):
idx1 = np.argsort(lst)
idx2 = np.argsort(a[:, 0])
idx1_inv = np.argsort(idx1)
result = a[idx2][idx1_inv]
Here, a[idx2]
is sorting a
with respect to its first column (maintaining rows correctly), then indexing by [idx1_inv]
reconstructs the same order as lst
.
It might be generalizable to the case where a
contains an arbitrary subset of the values of lst
, possibly with repetitions, but I guess that would get annoying. An easier solution is just to use python:
value_to_idx = {v: i for i, v in enumerate(lst)}
target_position = [value_to_idx[x] for x in a[:, 0]]
indices = np.argsort(target_position)
result = a[indices]
Remark that both these solution run in O(n * log(n))
(where n = len(a)
), while the solution by MichaelCG8 above using np.meshgrid
runs in O(n^2)
.
Upvotes: 2
Reputation: 579
I'm assuming all your values in lst
are unique, lst
and a
are the same length, and they contain the same characters that you're sorting by.
lst = np.array(['a','b','d','e','c'])
a = np.array([['e','b','a','d','c'],[1,2,3,4,5]]).T
first_col = a[:, 0]
lst_grid, first_col_grid = np.meshgrid(lst, b)
indices = np.argmax(lsta==ba, axis=0)
print(indices)
# array([2, 1, 3, 0, 4])
result = a[indices]
print(result)
# array([['a', '3'],
# ['b', '2'],
# ['d', '4'],
# ['e', '1'],
# ['c', '5']], dtype='<U21')
Upvotes: 1