Kevin Choi
Kevin Choi

Reputation: 765

Pandas Sorting by each row

Date        Count_Doc   Sum_Words   S&P 500     Russel 2000  Nasdaq     
2017-02-16  0.069946    3.839240    -0.568454   -0.514334   -0.592410
2017-04-12  1.655428    3.667811    -0.891697   -1.450381   -1.047976
2017-04-19  2.371889    2.110689    -0.284174   0.401092    0.427705
2017-04-20  3.261538    2.995514    1.846039    1.360092    1.660339
2017-05-02  0.738549    2.197852    0.081593    -0.849580   -0.231491

I want to leave "Count_Doc" and "Sum_Words" columns unchanged, but I am trying to sort the rest columns to be sorted by its values like below: (The order is not sorted, just ramdomly suffled )

Date        Count_Doc   Sum_Words   1st         2nd         3rd
2017-02-16  0.069946    3.839240    S&P 500     Nasdaq      Russel 2000
2017-04-12  1.655428    3.667811    Nasdaq      S&P 500     Russel 2000
2017-04-19  2.371889    2.110689    Nasdaq      S&P 500     Russel 2000
2017-04-20  3.261538    2.995514    Russel 2000 Nasdaq      S&P 500 
2017-05-02  0.738549    2.197852    Russel 2000 S&P 500     Nasdaq  

Is there any way to return the name of column as DataFrame values like that?

Thank you!

Upvotes: 0

Views: 65

Answers (2)

Scott Boston
Scott Boston

Reputation: 153460

Use this:

df = df.set_index(['Date','Count_Doc','Sum_Words'])
df_out = pd.DataFrame(df.columns[df.values.argsort(1)[::-1]].values, 
                       df.index, 
                       columns=['1st','2nd','3rd']).reset_index()
df_out

Output:

         Date  Count_Doc  Sum_Words          1st          2nd          3rd
0  2017-02-16   0.069946   3.839240  Russel 2000       Nasdaq      S&P 500
1  2017-04-12   1.655428   3.667811  Russel 2000       Nasdaq      S&P 500
2  2017-04-19   2.371889   2.110689      S&P 500  Russel 2000       Nasdaq
3  2017-04-20   3.261538   2.995514  Russel 2000       Nasdaq      S&P 500
4  2017-05-02   0.738549   2.197852       Nasdaq      S&P 500  Russel 2000

Upvotes: 1

Haleemur Ali
Haleemur Ali

Reputation: 28233

you can add 3 additional columns to your dataframe by ranking from the 3 indexes for each row.

df[['1st', '2nd', '3rd']] = df.iloc[:, [3,4,5]].apply(lambda x: pd.Series(x.sort_values(ascending=False).index), axis=1)
outputs:

         Date  Count_Doc  Sum_Words     ...               1st          2nd          3rd
0  2017-02-16   0.069946   3.839240     ...       Russel 2000      S&P 500       Nasdaq
1  2017-04-12   1.655428   3.667811     ...           S&P 500       Nasdaq  Russel 2000
2  2017-04-19   2.371889   2.110689     ...            Nasdaq  Russel 2000      S&P 500
3  2017-04-20   3.261538   2.995514     ...           S&P 500       Nasdaq  Russel 2000
4  2017-05-02   0.738549   2.197852     ...           S&P 500       Nasdaq  Russel 2000

here, i'm picking only the 3 columns whose names we want to sort, then applying a function row-wise which takes a series, sorts it, gets its indexes (i.e. names) and returns the indexes as a new series.

this is then assigned to the new columns ['1st', '2nd', '3rd'].

note, the sort order I used is descending, whereas in your example output you've just shown a random order.

Upvotes: 1

Related Questions