Reputation: 793
I have a dataframe:
df = pd.DataFrame({'p1_c': [1,10,3,40],
'p2_c': [4,3,4,14],
'p3_c': [5,2,8,78],
'p1_v': [776,1776,300,600],
'p2_v': [998,98,100,100],
'p3_v': [800,80,50,500]
})
df
p1_c p2_c p3_c p1_v p2_v p3_v
0 1 4 5 776 998 800
1 10 3 2 1776 98 80
2 3 4 8 300 100 50
3 40 14 78 600 100 500
(p1_c,p1_v),(p2_c,p2_v),(p3_c,p3_v) are in groups and describing same thing.
I want the get the v-th column value of the one having the highest c-th value
For example see in row 0 :
df[[p1_c,p2_c,p3_c]].max(axis=1)
#get the max value i.e p3_c
So I want p3_v value as an output (i.e 800). Same pattern followed by row 1,2,and 3.
EXPECTED OUTPUT
p1_c p2_c p3_c p1_v p2_v p3_v RESULT
0 1 4 5 776 998 800 800
1 10 3 2 1776 98 80 1776
2 3 4 8 300 100 50 50
3 40 154 78 600 100 500 100
Upvotes: 3
Views: 106
Reputation: 6483
You could try this with df.to_records
, that I find simpler, and could be the fastest way:
df['RESULT']=[row[list(row).index(max(list(row)[1:4]))+3]for row in df.to_records()]
print(df)
Output:
p1_c p2_c p3_c p1_v p2_v p3_v RESULT
0 1 4 5 776 998 800 800
1 10 3 2 1776 98 80 1776
2 3 4 8 300 100 50 50
3 40 14 78 600 100 500 500
Upvotes: 1
Reputation: 71689
Use, Series.str.endswith
along with DataFrame.idxmax
along axis=1
to extract the column names which ends with _c
and containing the max values, then use Series.str.rstrip
along with .add
to replace the _c
with _v
in those column names, finally use DataFrame.lookup
to get the values from the dataframe based on this cols
:
cols = df.loc[:, df.columns.str.endswith('_c')].idxmax(axis=1)
cols = cols.str.rstrip('_c').add('_v')
df['RESULT'] = df.lookup(df.index, cols)
Result:
# print(df)
p1_c p2_c p3_c p1_v p2_v p3_v RESULT
0 1 4 5 776 998 800 800
1 10 3 2 1776 98 80 1776
2 3 4 8 300 100 50 50
3 40 154 78 600 100 500 100
Upvotes: 0
Reputation: 453
Try this
data={'p1_c':[1,10,3,40],'p2_c':[4,3,3,14],'p3_c':[5,2,8,78],'p1_v':[776,1776,300,600],'p2_v':[998,98,100,100],'p3_v':[800,80,50,500]}
df_1=pd.DataFrame(data)
list=[]
for i in range(4):
if df_1[['p1_c','p2_c','p3_c']].max(axis=1)[i]==df_1.loc[i,'p3_c']:
list.append(df_1.loc[i,'p3_v'])
else:
list.append(df_1[['p1_v','p2_v']].max(axis=1)[i])
df_1['Result']=list
df_1
output
p1_c p2_c p3_c p1_v p2_v p3_v Result
0 1 4 5 776 998 800 800
1 10 3 2 1776 98 80 1776
2 3 3 8 300 100 50 50
3 40 14 78 600 100 500 500
Upvotes: 0
Reputation: 403
Definitely not an ideal solution but it solves the problem.
from pandas import DataFrame
df = DataFrame({'p1_c': [1,10,3,40], 'p2_c': [4,3,4,14], 'p3_c': [5,2,8,78], 'p1_v': [776, 1776, 300, 600], 'p2_v': [998, 98, 100, 100], 'p3_v': [800, 80, 50, 500]})
df['RESULT'] = [df.loc[i, df.loc[i, df.columns[:3]].idxmax(axis=1)[:-1] + 'v'] for i in range(len(df))]
p1_c p2_c p3_c p1_v p2_v p3_v RESULT
0 1 4 5 776 998 800 800
1 10 3 2 1776 98 80 1776
2 3 4 8 300 100 50 50
3 40 14 78 600 100 500 500
Upvotes: 0
Reputation: 29635
you can do it with filter
to get _c and _v columns and use where
to replace by nan values in _v columns where the value in _c column are not the mask per row. then get the max
#get only _c columns
dfc = df.filter(regex='_c')
df['RESULT'] = (df.filter(regex='_v') #get _p columns
#keep value in _p where max in _c per row
.where(dfc.eq(dfc.max(axis=1).to_numpy()[:, None]).to_numpy())
.max(axis=1) #get the max
)
print (df)
p1_c p2_c p3_c p1_v p2_v p3_v RESULT
0 1 4 5 776 998 800 800.0
1 10 3 2 1776 98 80 1776.0
2 3 4 8 300 100 50 50.0
3 40 154 78 600 100 500 100.0
Upvotes: 0