hickz
hickz

Reputation: 37

Sort a pandas dataframe by 2 columns (one with integers, one with alphanumerics) with priority for integer column

I have a dataframe with two columns.

One column has integers: 1,2,3, .... 10,11,12, ...

One column has alphanumerics (as strings): A1, A2, ... A10, A11, ... B1, B2, ... B10, B11 ... (always one letter + one or two numbers, letters A-H, numbers 1-12)

Goal is to sort the dataframe by the integer column ascending with first priority, and by the alphanumerical column with second priority. The alphanumerical column should be sorted like this: A1, B1, C1, D1, ... A2, B2, C2, D2, ..., A3, B3, C3, D3, ...

(this is a problem from biology/wetlab/robotics/automation/96-well plates).

Any ideas?

Upvotes: 1

Views: 1374

Answers (3)

SeaBean
SeaBean

Reputation: 23217

You can do it this way:

  1. Split the second column with alphanumeric strings into 2 columns: one column Letter to hold the first letter and another column Number to hold a number of one or two digits.
  2. Convert Number column from string to integer.
  3. Then, sort these 2 new columns together with the first column of integers

Let's illustrate the process with an example below:

Assume we have the dataframe df as follows:

print(df)

   Col1 Col2
0     2  B12
1    11   C2
2     2   A1
3    11   B2
4     2   B1
5    11  C12
6     2  A12
7    11   C1
8     2   A2

Step 1 & 2: Split Col2 into 2 columns Letter & Number + Convert Number column from string to integer:

df['Letter'] = df['Col2'].str[0]               # take 1st char
df['Number'] = df['Col2'].str[1:].astype(int)  # take 2nd char onwards and convert to integer 

Result:

print(df)

   Col1 Col2 Letter  Number
0     2  B12      B      12
1    11   C2      C       2
2     2   A1      A       1
3    11   B2      B       2
4     2   B1      B       1
5    11  C12      C      12
6     2  A12      A      12
7    11   C1      C       1
8     2   A2      A       2

Step 3: Sort Col1, Letter and Number with priority: Col1 ---> Number ---> Letter:

df = df.sort_values(by=['Col1', 'Number', 'Letter'])

Result:

print(df)

   Col1 Col2 Letter  Number
2     2   A1      A       1
4     2   B1      B       1
8     2   A2      A       2
6     2  A12      A      12
0     2  B12      B      12
7    11   C1      C       1
3    11   B2      B       2
1    11   C2      C       2
5    11  C12      C      12

After sorting, you can remove the Letter and Number columns, as follows:

df = df.drop(['Letter', 'Number'], axis=1)

If you want to do all in one step, you can also chain the instructions, as follows:

df = (df.assign(Letter=df['Col2'].str[0], 
                Number=df['Col2'].str[1:].astype(int))
        .sort_values(by=['Col1', 'Number', 'Letter'])
        .drop(['Letter', 'Number'], axis=1)
     )

Result:

print(df)

   Col1 Col2
2     2   A1
4     2   B1
8     2   A2
6     2  A12
0     2  B12
7    11   C1
3    11   B2
1    11   C2
5    11  C12

Upvotes: 2

mozway
mozway

Reputation: 261580

You can use the key argument of sort_values. Just apply a transformation to change 'B1' to '01B':

df = pd.DataFrame({'col1': ['C1', 'C1', 'A2', 'B12', 'B1','B2'],
                   'col2': [2,1,3,4,2,5]})

(df.sort_values(by='col2')
   .sort_values(by='col1',
                key=lambda x: x.str[1:].str.pad(2, 'left', '0')+x.str[0])
)

Output:

  col1  col2
4   B1     2
1   C1     1
0   C1     2
2   A2     3
5   B2     5
3  B12     4

Upvotes: 0

SantiMunoz
SantiMunoz

Reputation: 21

Based on your example you could separate your alphanumerical column first and then sort it accordingly. You could:

df['numericalValue'] = df['alphaNumerical'][1:] # Assuming you only have 1 letter

df = df.sort_values(['numericalValue', 'alphaNumerical'], ascending = (True, True))

This would first sort your dataframe by the numerical value, as shown in your example and then secondarily sort the values that share the same numerical value by its letter.

If you have a variable number of non-numeric characters you can use something like this:

Good luck!

Upvotes: 0

Related Questions