Reputation: 45
I have a table like this
uid|store_1_@_A|store_2_%_7_B|store_3_&_9_C
---------------------------------------
1 |3 |4 |5
2 |20 |1 |9
3 |4 |88 |49
I want to form a new table like this:
uid|store_1_@_A|store_2_%_7_B|store_3_&_9_C|favorite_store
------------------------------------------------------
1 |3 |4 |5 |C
2 |20 |1 |9 |A
3 |4 |88 |49 |B
favorite_store
is a new variable. For every uid, check the value of three stores, and find one with the highest value, e.g: uid =1, the max value = 5, belongs to store_3_&_9_C, so favorite_store = C
.
df = pd.DataFrame({'uid':[1,2,3],
'store_1_@_A':[3,20,4],
'store_2_%_7_B':[4,1,88],
'store_3_&_9_C':[5,9,49]})
I used df.iloc[0].max()
to get the max value of first row but I don't know how to proceed.
In case of how to get the last character in store's name, i was thinking using the last '_' . Such as re.findall('[^\_]+$', 're.findall('[^\_]+$', 'store_3_A')[0]')[0]
may work.
Upvotes: 1
Views: 81
Reputation: 7204
Try this:
df['favorite_store'] = df.T.idxmax()
In [5248]: df
Out[5248]:
uid store_1 store_2 store_3 favorite_store
0 1 3 4 5 store_3
1 2 20 1 9 store_1
2 3 4 88 49 store_2
or
df['favorite_store'] = df.T.idxmax().str.extract(r'store_(\d+)')
In [5266]: df
Out[5266]:
uid store_1 store_2 store_3 favorite_store
0 1 3 4 5 3
1 2 20 1 9 1
2 3 4 88 49 2
Upvotes: 1
Reputation: 483
Try this:
import pandas as pd
df = pd.DataFrame({'uid':[1,2,3],
'store_1':[3,20,4],
'store_2':[4,1,88],
'store_3':[5,9,49]})
fav_full = (df.idxmax(axis=1))
fav_value = []
for ele in fav_full:
fav_value.append(ele[6:])
df['Favourite column'] = (fav_value)
print (df)
There must be a cleaner way to do this. But this will work if all your 'store' headers are in the format: store_x
Upvotes: 0