Amit
Amit

Reputation: 793

How to get the column for a column having max value in python?

I have a dataframe:

df = pd.DataFrame({'p1_c': [1,10,3,40],
                   'p2_c': [4,3,4,14],
                   'p3_c': [5,2,8,78],
                   'p1_v': [776,1776,300,600],
                   'p2_v': [998,98,100,100],
                   'p3_v': [800,80,50,500]
                  })
df
   p1_c  p2_c  p3_c  p1_v  p2_v  p3_v
0     1     4     5   776   998   800
1    10     3     2  1776    98    80
2     3     4     8   300   100    50
3    40    14    78   600   100   500

(p1_c,p1_v),(p2_c,p2_v),(p3_c,p3_v) are in groups and describing same thing.
I want the get the v-th column value of the one having the highest c-th value

For example see in row 0 :

df[[p1_c,p2_c,p3_c]].max(axis=1)   
#get the max value i.e p3_c

So I want p3_v value as an output (i.e 800). Same pattern followed by row 1,2,and 3.

EXPECTED OUTPUT

   p1_c  p2_c  p3_c  p1_v   p2_v  p3_v  RESULT
 0  1     4     5    776    998    800   800
 1  10    3     2    1776    98    80    1776
 2  3     4     8    300    100    50    50
 3  40   154    78   600    100    500   100

Upvotes: 3

Views: 106

Answers (5)

MrNobody33
MrNobody33

Reputation: 6483

You could try this with df.to_records, that I find simpler, and could be the fastest way:

df['RESULT']=[row[list(row).index(max(list(row)[1:4]))+3]for row in df.to_records()]
print(df)

Output:

   p1_c  p2_c  p3_c  p1_v  p2_v  p3_v  RESULT
0     1     4     5   776   998   800     800
1    10     3     2  1776    98    80    1776
2     3     4     8   300   100    50      50
3    40    14    78   600   100   500     500

Upvotes: 1

Shubham Sharma
Shubham Sharma

Reputation: 71689

Use, Series.str.endswith along with DataFrame.idxmax along axis=1 to extract the column names which ends with _c and containing the max values, then use Series.str.rstrip along with .add to replace the _c with _v in those column names, finally use DataFrame.lookup to get the values from the dataframe based on this cols:

cols = df.loc[:, df.columns.str.endswith('_c')].idxmax(axis=1)
cols = cols.str.rstrip('_c').add('_v')
df['RESULT'] = df.lookup(df.index, cols)

Result:

# print(df)

   p1_c  p2_c  p3_c  p1_v  p2_v  p3_v  RESULT
0     1     4     5   776   998   800     800
1    10     3     2  1776    98    80    1776
2     3     4     8   300   100    50      50
3    40   154    78   600   100   500     100

Upvotes: 0

NAGA RAJ S
NAGA RAJ S

Reputation: 453

Try this

data={'p1_c':[1,10,3,40],'p2_c':[4,3,3,14],'p3_c':[5,2,8,78],'p1_v':[776,1776,300,600],'p2_v':[998,98,100,100],'p3_v':[800,80,50,500]}
df_1=pd.DataFrame(data)
list=[]
for i in range(4):
  if df_1[['p1_c','p2_c','p3_c']].max(axis=1)[i]==df_1.loc[i,'p3_c']:
     list.append(df_1.loc[i,'p3_v'])
  else:
     list.append(df_1[['p1_v','p2_v']].max(axis=1)[i])


df_1['Result']=list
df_1

output

  p1_c  p2_c    p3_c    p1_v    p2_v    p3_v    Result
0         1     4   5   776     998       800     800
1         10    3   2   1776    98        80      1776
2          3    3   8   300     100       50      50
3         40    14  78  600     100       500     500

Upvotes: 0

Berkay
Berkay

Reputation: 403

Definitely not an ideal solution but it solves the problem.

from pandas import DataFrame

df = DataFrame({'p1_c': [1,10,3,40], 'p2_c': [4,3,4,14], 'p3_c': [5,2,8,78], 'p1_v': [776, 1776, 300, 600], 'p2_v': [998, 98, 100, 100], 'p3_v': [800, 80, 50, 500]})

df['RESULT'] = [df.loc[i, df.loc[i, df.columns[:3]].idxmax(axis=1)[:-1] + 'v'] for i in range(len(df))]
   p1_c  p2_c  p3_c  p1_v  p2_v  p3_v  RESULT
0     1     4     5   776   998   800     800
1    10     3     2  1776    98    80    1776
2     3     4     8   300   100    50      50
3    40    14    78   600   100   500     500

Upvotes: 0

Ben.T
Ben.T

Reputation: 29635

you can do it with filter to get _c and _v columns and use where to replace by nan values in _v columns where the value in _c column are not the mask per row. then get the max

#get only _c columns
dfc = df.filter(regex='_c')

df['RESULT'] = (df.filter(regex='_v') #get _p columns
                  #keep value in _p where max in _c per row 
                  .where(dfc.eq(dfc.max(axis=1).to_numpy()[:, None]).to_numpy())
                  .max(axis=1) #get the max
               )
print (df)
   p1_c  p2_c  p3_c  p1_v  p2_v  p3_v  RESULT
0     1     4     5   776   998   800   800.0
1    10     3     2  1776    98    80  1776.0
2     3     4     8   300   100    50    50.0
3    40   154    78   600   100   500   100.0

Upvotes: 0

Related Questions