secJ
secJ

Reputation: 509

Pandas: Sort DataFrame Cells across columns if unordered

Looks like my last question was closed, but I forgot to mention the update below the first time. Only modifying a few of the columns and not all.

What is the best way to modify (sort) a Series of data in a Pandas DataFrame? For example, after importing some data, colums should be in ascending order, but I need to reorder data if it is not. Data is being imported from a csv into a pandas.df.

           num_1    num_2   num_3
date
2020-02-03  17      22       36
2020-02-06  52      22       14
2020-02-10  5        8       29
2020-02-13  10      14       30
2020-02-17  7        8       19

I would ideally find the second row (panda Series) in the Dataframe as the record to be fixed:

           num_1    num_2   num_3    num_4     num_5
date
2020-02-06  52      22       14       25         27

And modify it to be: (Only sorting nums 1-3 and not touching columns 4 & 5)

           num_1    num_2   num_3    num_4   num_5
date
2020-02-06  14      22       52        25      27

I could iterate over the DataFrame and check for indexes that have Series data out of order by comparing each column to the column to it's right. Then write a custom sorter and rewrite that record back into the Dataframe, but that seems clunky.

I have to imagine there's a more Pythonic (Pandas) way to do this type of thing. I just can't find it in the pandas documentation. I don't want to reorder the rows just make sure the values are in the appropriate order within the columns.

Update: I forgot to mention one of the most critical aspects. There are other columns in the DataFrame that should not be touched. So in the example below, only sort (num_1, num_2, num_3) not the others. I'm guessing I can use the solutions posed already, split the DataFrame, sort the first part and re-merge them together. Is there an alternative?

Upvotes: 1

Views: 477

Answers (3)

nav610
nav610

Reputation: 791

The best way is to use the sort_values() function and only allow it work on the columns which require sorting.

for index, rows in df.iterrows(): 
    df[['col1','col2','col3']] = df[['col1','col2','col3']].sort_values(by=[index], axis = 1, ascending = True)

This loops through every row, makes the values in the desired columns ascending, and then resaves the columns.

Upvotes: 2

Ezer K
Ezer K

Reputation: 3739

Spliting and reconnecting does not sound bad to me, here is what I got:

cols_to_sort = ['num_1', 'num_2', 'num_3']
pd.concat([pd.DataFrame(np.sort(df[cols_to_sort].values), columns=cols_to_sort, index=df.index), df[df.columns[~df.columns.isin(cols_to_sort)]]], axis=1)

Upvotes: 2

Roim
Roim

Reputation: 3066

Pandas does not support what you ask for by default (as much as I know). Usually each column is a different feature, so changing it's value may seem a little odd. Anyway, pandas work extremely well with numpy. This is your rescue.

You can convert relevant columns to numpy array, sort by row, and then put the result back in the dataframe.

import numpy as np
cols_list = ["num_1","num_2","num_3"]
tmp_arr = np.array(df.loc[:, cols_list])
tmp_arr.sort(axis=1)
df.loc[:, cols_list] = tmp_arr

Full example:

import pandas as pd
import numpy as np

df = pd.DataFrame({"Day":range(1,5),"num_1":[5,2,7,1], "num_2":[2,7,4,10], "num_3":[7,27,64,10]})

print(df)

cols_list = ["num_1","num_2","num_3"]
tmp_arr = np.array(df.loc[:, cols_list])
tmp_arr.sort(axis=1)
df.loc[:, cols_list] = tmp_arr

print(df)

The first print result:

   Day  num_1  num_2  num_3
0    1      5      2      7
1    2      2      7     27
2    3      7      4     64
3    4      1     10     10

second print result:

   Day  num_1  num_2  num_3
0    1      2      5      7
1    2      2      7     27
2    3      4      7     64
3    4      1     10     10

You can select whatever columns you like (cols_list).

After I already wrote this I found the similar solution here: Fastest way to sort each row in a pandas dataframe

Upvotes: 0

Related Questions