Joonsung Park
Joonsung Park

Reputation: 23

Choosing the values in the column based on the maximum values of other column

I am choosing the values in Pandas DataFrame.
I would like to choose the values in the columns 'One_T','Two_T','Three_T'(which means the total counts), based on the Ratios of the columns('One_R','Two_R','Three_R').

Comparing values is done by the columns('One_R','Two_R','Three_R') and choosing values will be done by columns ('One_T','Two_T','Three_T').

I would like to find the highest values among columns('One_R','Two_R','Three_R') and put values from columns 'One_T','Two_T','Three_T' in new column 'Highest'.

For example, the first row has the highest values in One_R than Two_R and Three_R. Then, the values in One_T will be filled the column named Highest.

The initial data frame is test below code and the desired result is the result in the below code.

test = pd.DataFrame([[150,30,140,20,120,19],[170,31,130,30,180,22],[230,45,100,50,140,40],
                    [140,28,80,10,60,10],[100,25,80,27,50,23]], index=['2019-01-01','2019-02-01','2019-03-01','2019-04-01','2019-05-01'],
                   columns=['One_T','One_R','Two_T','Two_R','Three_T','Three_R'])
 One_T  One_R   Two_T   Two_R   Three_T Three_R
2019-01-01  150 30  140 20  120 19
2019-02-01  170 31  130 30  180 22
2019-03-01  230 45  100 50  140 40
2019-04-01  140 28  80  10  60  10
2019-05-01  100 25  80  27  50  23
result = pd.DataFrame([[150,30,140,20,120,19,150],[170,31,130,30,180,22,170],[230,45,100,50,140,40,100],
                    [140,28,80,10,60,10,140],[100,25,80,27,50,23,80]], index=['2019-01-01','2019-02-01','2019-03-01','2019-04-01','2019-05-01'],
                   columns=['One_T','One_R','Two_T','Two_R','Three_T','Three_R','Highest'])
One_T   One_R   Two_T   Two_R   Three_T Three_R Highest
2019-01-01  150 30  140 20  120 19  150
2019-02-01  170 31  130 30  180 22  170
2019-03-01  230 45  100 50  140 40  100
2019-04-01  140 28  80  10  60  10  140
2019-05-01  100 25  80  27  50  23  80

Is there any way to do this?

Thank you for time and considerations.

Upvotes: 2

Views: 43

Answers (1)

anky
anky

Reputation: 75080

You can solve this using df.filter to select columns with the _R suffix, then idxmax. Then replace _R with _T and use df.lookup:

s = test.filter(like='_R').idxmax(1).str.replace('_R','_T')
test['Highest'] = test.lookup(s.index,s)
print(test)

            One_T  One_R  Two_T  Two_R  Three_T  Three_R  Highest
2019-01-01    150     30    140     20      120       19      150
2019-02-01    170     31    130     30      180       22      170
2019-03-01    230     45    100     50      140       40      100
2019-04-01    140     28     80     10       60       10      140
2019-05-01    100     25     80     27       50       23       80

Upvotes: 2

Related Questions