Reputation: 3
I have a table of over 10,000 rows and over 400 columns. For columns containing at least the string 'xyz', I need to find the max value of each row (within these 'xyz' columns), and create 2 new columns.
The 1st new column would contain the max value of each row of these 'xyz' columns.
The 2nd new column would contain the column name from which the max value was retrieved. I'm stuck at creating the 2nd column. I've tried some stuff which doesn't work like;
Match = df[CompCol].isin[SpecList].all(axis=1)
How should approach the 2nd column?
Upvotes: 0
Views: 1004
Reputation: 10913
another way using 'regex' and 'idmax.
df = pd.DataFrame({'xyz1': [10, 20, 30, 40], 'xyz2': [11, 12,13,14],'xyz3':[1,2,3,44],'abc':[100,101,102,103]})
df['maxval']= df.filter(regex='xyz').apply(max, axis=1)
df['maxval_col'] = df.filter(regex='xyz').idxmax(axis=1)
abc xyz1 xyz2 xyz3 maxval maxval_col
100 10 11 1 11 xyz2
101 20 12 2 20 xyz1
102 30 13 3 30 xyz1
103 40 14 44 44 xyz3
Upvotes: 3
Reputation: 36545
Does this work for you?
import pandas as pd
df = pd.DataFrame([(1,2,3,4),(2,1,1,4)], columns = ['xyz1','xyz2','xyz3','abc'])
cols = [k for k in df.columns if 'xyz' in k]
df['maxval'] = df[cols].apply(lambda s: max(zip(s, s.keys()))[0],1)
df['maxcol'] = df[cols].apply(lambda s: max(zip(s, s.keys()))[1],1)
df
Out[753]:
xyz1 xyz2 xyz3 abc maxval maxcol
0 1 2 3 4 3 xyz3
1 2 1 1 4 2 xyz1
Upvotes: 0