bulala
bulala

Reputation: 45

How to get a certain value's column name in a dataframe

I have a table like this

uid|store_1_@_A|store_2_%_7_B|store_3_&_9_C
---------------------------------------
1  |3          |4            |5
2  |20         |1            |9
3  |4          |88           |49    

I want to form a new table like this:

uid|store_1_@_A|store_2_%_7_B|store_3_&_9_C|favorite_store
------------------------------------------------------
1  |3          |4            |5            |C
2  |20         |1            |9            |A
3  |4          |88           |49           |B

favorite_store is a new variable. For every uid, check the value of three stores, and find one with the highest value, e.g: uid =1, the max value = 5, belongs to store_3_&_9_C, so favorite_store = C.

df = pd.DataFrame({'uid':[1,2,3],
                   'store_1_@_A':[3,20,4],
                   'store_2_%_7_B':[4,1,88],
                   'store_3_&_9_C':[5,9,49]})

I used df.iloc[0].max() to get the max value of first row but I don't know how to proceed. In case of how to get the last character in store's name, i was thinking using the last '_' . Such as re.findall('[^\_]+$', 're.findall('[^\_]+$', 'store_3_A')[0]')[0] may work.

Upvotes: 1

Views: 81

Answers (3)

Ami Tavory
Ami Tavory

Reputation: 76297

Use idxmax:

df['favorite_store'] = df.idxmax(axis=1)

Upvotes: 2

oppressionslayer
oppressionslayer

Reputation: 7204

Try this:

df['favorite_store'] = df.T.idxmax()                                                                                                                                                

In [5248]: df                                                                                                                                                                                  
Out[5248]: 
   uid  store_1  store_2  store_3 favorite_store
0    1        3        4        5        store_3
1    2       20        1        9        store_1
2    3        4       88       49        store_2

or

df['favorite_store'] = df.T.idxmax().str.extract(r'store_(\d+)')                                                                                                                    

In [5266]: df                                                                                                                                                                                  
Out[5266]: 
   uid  store_1  store_2  store_3 favorite_store
0    1        3        4        5              3
1    2       20        1        9              1
2    3        4       88       49              2

Upvotes: 1

Neal Titus Thomas
Neal Titus Thomas

Reputation: 483

Try this:

import pandas as pd
df = pd.DataFrame({'uid':[1,2,3],
               'store_1':[3,20,4],
               'store_2':[4,1,88],
               'store_3':[5,9,49]})

fav_full = (df.idxmax(axis=1))
fav_value = []
for ele in fav_full:
    fav_value.append(ele[6:])
df['Favourite column'] = (fav_value)
print (df)

There must be a cleaner way to do this. But this will work if all your 'store' headers are in the format: store_x

Upvotes: 0

Related Questions