Reputation: 1471
Is there a way to sort the rows of an array by the last element, in this case the cell ids. The cell id is build as follows : "CellID_NumberOfCell
arr =np.array([['65.0','30.0','20.0','0.0','0_0'],
['2.0','29.0','24.0','0.0','1_0'],
['0.0','18.0','4.0','0.0','2_0'],
['16.0','9.0','0.0','9990.0','7_203'],
['16.0','9.0','0.0','9990.0','0_203'],
['20.0','23.0','31.0','9990.0','8_158'],
['65.0','30.0','20.0','0.0','0_10']])
So after sorting it should look like:
arr =np.array([['65.0','30.0','20.0','0.0','0_0'],
['65.0','30.0','20.0','0.0','0_10'],
['16.0','9.0','0.0','9990.0','0_203'],
['2.0','29.0','24.0','0.0','1_0'],
['0.0','18.0','4.0','0.0','2_0'],
['16.0','9.0','0.0','9990.0','7_203'],
['20.0','23.0','31.0','9990.0','8_158']])
EDIT:
Is it also possible to delete the numbers after the underscore after sorting?. So that i just have the ID. Instead of 0_0 just 0.
EDIT2
After sorting the ID, it should also sort after time, so that every ID with 0 for example should also be sorted after time 0,1...9999 etc.
Upvotes: 1
Views: 1657
Reputation: 221664
We need to split the last column by that underscore
, lexsort it and then use those indices to sort the input array.
Thus, an implementation would be -
def numpy_app(arr):
# Extract out the strings on last column split based on '_'.
# Thus, for given sample we would have the last column would be
# split further into 3 columns, the middle one being of '_''s.
a = np.core.defchararray.partition(arr[:,-1],'_')
# Lexsort it on the last numeric cols (0,2). We need to flip
# the order of columns to give precedence to the first string
sidx = np.lexsort(a[:,2::-2].astype(int).T)
# Index into input array with lex-sorted indices for final o/p
return arr[sidx]
Based on the edits in the question, it seems we want to cut out the string after the underscore. To do so, here's a modified version -
def numpy_cut_app(arr):
a = np.core.defchararray.partition(arr[:,-1],'_')
sidx = np.lexsort(a[:,2::-2].astype(int).T)
out = arr[sidx]
# Replace the last column with the first string off the last column's split one
out[:,-1] = a[sidx,0]
return out
Based on more edits, it seems we want to include the fourth column into lex-sorting and neglect everything after the underscore in the last column. So, a further modified version would be -
def numpy_cut_col3_app(arr):
a = np.core.defchararray.partition(arr[:,-1],'_')
# Lex-sort using first off the split strings from last col(precedence to it)
# and col-3 of input array
sidx = np.lexsort([arr[:,3].astype(float), a[:,0]])
out = arr[sidx]
out[:,-1] = a[sidx,0]
return out
Sample runs -
In [567]: arr
Out[567]:
array([['65.0', '30.0', '20.0', '0.0', '9_49'],
['2.0', '29.0', '24.0', '0.0', '1_0'],
['0.0', '18.0', '4.0', '0.0', '2_0'],
['16.0', '9.0', '0.0', '9990.0', '7_203'],
['16.0', '9.0', '0.0', '9990.0', '9_5'],
['20.0', '23.0', '31.0', '9990.0', '8_158'],
['65.0', '30.0', '20.0', '0.0', '9_50']],
dtype='|S6')
In [568]: numpy_app(arr)
Out[568]:
array([['2.0', '29.0', '24.0', '0.0', '1_0'],
['0.0', '18.0', '4.0', '0.0', '2_0'],
['16.0', '9.0', '0.0', '9990.0', '7_203'],
['20.0', '23.0', '31.0', '9990.0', '8_158'],
['16.0', '9.0', '0.0', '9990.0', '9_5'],
['65.0', '30.0', '20.0', '0.0', '9_49'],
['65.0', '30.0', '20.0', '0.0', '9_50']],
dtype='|S6')
In [569]: numpy_cut_app(arr)
Out[569]:
array([['2.0', '29.0', '24.0', '0.0', '1'],
['0.0', '18.0', '4.0', '0.0', '2'],
['16.0', '9.0', '0.0', '9990.0', '7'],
['20.0', '23.0', '31.0', '9990.0', '8'],
['16.0', '9.0', '0.0', '9990.0', '9'],
['65.0', '30.0', '20.0', '0.0', '9'],
['65.0', '30.0', '20.0', '0.0', '9']],
dtype='|S6')
Upvotes: 2
Reputation: 1003
You can do it easely with sorted and lambda function and as suggested by @Divakar to get the numpy array back:
np.array(sorted(arr, key=lambda x :x[-1]))
output
[['65.0', '30.0', '20.0', '0.0', '0_0'],
['65.0', '30.0', '20.0', '0.0', '0_10'],
['16.0', '9.0', '0.0', '9990.0', '0_203'],
['2.0', '29.0', '24.0', '0.0', '1_0'],
['0.0', '18.0', '4.0', '0.0', '2_0'],
['16.0', '9.0', '0.0', '9990.0', '7_203'],
['20.0', '23.0', '31.0', '9990.0', '8_158']]
EDIT : you can do it by using this, not pretty, but does the work
np.array([ np.append(i[:-1],i[-1].split("_")[0]) for i in sorted(list(arr), key=lambda x :x[-1])])
ouput
array([['65.0', '30.0', '20.0', '0.0', '0'],
['65.0', '30.0', '20.0', '0.0', '0'],
['16.0', '9.0', '0.0', '9990.0', '0'],
['2.0', '29.0', '24.0', '0.0', '1'],
['0.0', '18.0', '4.0', '0.0', '2'],
['16.0', '9.0', '0.0', '9990.0', '7'],
['20.0', '23.0', '31.0', '9990.0', '8']],
dtype='<U6')
Upvotes: 2
Reputation: 13218
np.argsort(arr[:, -1])
will give you the permutation so that elements of the last column of arr
are ordered.
Then, arr[np.argsort(arr[:, -1])]
reorders the rows of arr according to this permutation.
Beware that the lexicographic order is used since your data consists of string, so 0_10
comes before 0_2
. If this is not what you want, you should split the last column, and I advise you to use a pandas.DataFrame
:
import pandas as pd
df = pd.DataFrame(arr)
df['Cell'], df['CellIndex'] = df[df.columns[-1]].str.split('_', 1).str
df['Cell'] = df['Cell'].astype(int)
df['CellIndex'] = df['CellIndex'].astype(int)
df.sort_values(['Cell', 'CellIndex'])
pandas is really the way to go to manipulate this kind of data.
Upvotes: 5