othnin
othnin

Reputation: 93

Copy contents of dataframe to another dataframe with same rows and cols

I need to convert a dataframe with multiple columns to a "ranked order" table by rows. This is my test data:

raw = [{'MS':100, 'KR':1, 'KU': 0.1, 'BR': 33},
       {'MS':9,   'KR':4, 'KU': 0.4, 'BR': 40},
       {'MS':40,  'KR':6, 'KU': 0.2, 'BR': 50},
       {'MS':59,  'KR':3, 'KU': 0.4, 'BR': 30},
       {'MS':33,  'KR':8, 'KU': 0.5, 'BR': 20},
       {'MS':40,  'KR':2, 'KU': 0.2, 'BR': 90},
       {'MS':70,  'KR':2, 'KU': 0.01, 'BR': 80},
    ]

raw_df = pd.DataFrame(raw, index=['A','B','C','D','E','F','G'])

Basically what I need is to go from here:

    MS   KR  KU  BR 
A  100   1  0.10 33 
B    9   4  0.40 40 
C   40   6  0.20 50 
D   59   3  0.40 30 
E   33   8  0.50 20 
F   40   2  0.20 90 
G   70   2  0.01 80 

to here(where the highest raw values are ranked higher):

    MS   KR   KU    BR   
C   40   6    0.20  50   
E   33   8    0.50  20  
D   59   3    0.40  30  
F   40   2    0.20  90 
B   9    4    0.40  40  
G   70   2    0.01  80  
A   100   1   0.10  33  

I have rank ordered every column and then created a row rank by taking the average and then sorted the dataframe by that new row. Right now I have the following ranked dataframe but I need to copy over the contents with the old raw data and not the current data.

    MS   KR   KU   BR   rank
C  4.5  2.0  4.5  3.0  3.500
E  6.0  1.0  1.0  7.0  3.750
D  3.0  4.0  2.5  6.0  3.875
F  4.5  5.5  4.5  1.0  3.875
B  7.0  3.0  2.5  4.0  4.125
G  2.0  5.5  7.0  2.0  4.125
A  1.0  7.0  6.0  5.0  4.750

I can use iterrrow() but it is expensive for large datasets. Is there some panda-fu here that can accomplish this?

Upvotes: 1

Views: 170

Answers (2)

Arkadiusz
Arkadiusz

Reputation: 1875

If you create it from raw_df you can do:

raw_df.reindex(raw_df.rank(ascending=False).mean(axis=1).sort_values().index)

Output:

     MS  KR    KU   BR
C    40   6  0.20   50
E    33   8  0.50   20
D    59   3  0.40   30
F    40   2  0.20   90
B     9   4  0.40   40
G    70   2  0.01   80
A   100   1  0.10   33

We set rank by descending order, takes its mean by columns and sort raw_df by using .reindex() method.

Upvotes: 1

Andrej Kesely
Andrej Kesely

Reputation: 195408

If I've understood you right: If df_rank contains your sorted dataframe and raw_df your unsorted one, you can do:

print(raw_df.loc[df_rank.index])

Prints:

    MS  KR    KU  BR
C   40   6  0.20  50
E   33   8  0.50  20
D   59   3  0.40  30
F   40   2  0.20  90
B    9   4  0.40  40
G   70   2  0.01  80
A  100   1  0.10  33

Upvotes: 2

Related Questions