Reputation: 37
I have a dataframe with two columns.
One column has integers: 1,2,3, .... 10,11,12, ...
One column has alphanumerics (as strings): A1, A2, ... A10, A11, ... B1, B2, ... B10, B11 ... (always one letter + one or two numbers, letters A-H, numbers 1-12)
Goal is to sort the dataframe by the integer column ascending with first priority, and by the alphanumerical column with second priority. The alphanumerical column should be sorted like this: A1, B1, C1, D1, ... A2, B2, C2, D2, ..., A3, B3, C3, D3, ...
(this is a problem from biology/wetlab/robotics/automation/96-well plates).
Any ideas?
Upvotes: 1
Views: 1374
Reputation: 23217
You can do it this way:
Letter
to hold the first letter and another column Number
to hold a number of one or two digits.Number
column from string to integer.Let's illustrate the process with an example below:
Assume we have the dataframe df
as follows:
print(df)
Col1 Col2
0 2 B12
1 11 C2
2 2 A1
3 11 B2
4 2 B1
5 11 C12
6 2 A12
7 11 C1
8 2 A2
Step 1 & 2: Split Col2
into 2 columns Letter
& Number
+ Convert Number
column from string to integer:
df['Letter'] = df['Col2'].str[0] # take 1st char
df['Number'] = df['Col2'].str[1:].astype(int) # take 2nd char onwards and convert to integer
Result:
print(df)
Col1 Col2 Letter Number
0 2 B12 B 12
1 11 C2 C 2
2 2 A1 A 1
3 11 B2 B 2
4 2 B1 B 1
5 11 C12 C 12
6 2 A12 A 12
7 11 C1 C 1
8 2 A2 A 2
Step 3: Sort Col1
, Letter
and Number
with priority: Col1
---> Number
---> Letter
:
df = df.sort_values(by=['Col1', 'Number', 'Letter'])
Result:
print(df)
Col1 Col2 Letter Number
2 2 A1 A 1
4 2 B1 B 1
8 2 A2 A 2
6 2 A12 A 12
0 2 B12 B 12
7 11 C1 C 1
3 11 B2 B 2
1 11 C2 C 2
5 11 C12 C 12
After sorting, you can remove the Letter
and Number
columns, as follows:
df = df.drop(['Letter', 'Number'], axis=1)
If you want to do all in one step, you can also chain the instructions, as follows:
df = (df.assign(Letter=df['Col2'].str[0],
Number=df['Col2'].str[1:].astype(int))
.sort_values(by=['Col1', 'Number', 'Letter'])
.drop(['Letter', 'Number'], axis=1)
)
Result:
print(df)
Col1 Col2
2 2 A1
4 2 B1
8 2 A2
6 2 A12
0 2 B12
7 11 C1
3 11 B2
1 11 C2
5 11 C12
Upvotes: 2
Reputation: 261580
You can use the key
argument of sort_values. Just apply a transformation to change 'B1' to '01B':
df = pd.DataFrame({'col1': ['C1', 'C1', 'A2', 'B12', 'B1','B2'],
'col2': [2,1,3,4,2,5]})
(df.sort_values(by='col2')
.sort_values(by='col1',
key=lambda x: x.str[1:].str.pad(2, 'left', '0')+x.str[0])
)
Output:
col1 col2
4 B1 2
1 C1 1
0 C1 2
2 A2 3
5 B2 5
3 B12 4
Upvotes: 0
Reputation: 21
Based on your example you could separate your alphanumerical column first and then sort it accordingly. You could:
df['numericalValue'] = df['alphaNumerical'][1:] # Assuming you only have 1 letter
df = df.sort_values(['numericalValue', 'alphaNumerical'], ascending = (True, True))
This would first sort your dataframe by the numerical value, as shown in your example and then secondarily sort the values that share the same numerical value by its letter.
If you have a variable number of non-numeric characters you can use something like this:
Good luck!
Upvotes: 0