Rick Batra
Rick Batra

Reputation: 61

Find the difference between the max value and 2nd highest value within a subset of pandas columns

I have a fairly large dataframe:

A B C D
0 17 36 45 54
1 18 23 17 17
2 74 47 8 46
3 48 38 96 83

I am trying to create a new column that is the (max value of the columns) - (2nd highest value) / (2nd highest value).

In this example it would look something like:

A B C D Diff
0 17 36 45 54 .20
1 18 23 17 17 .28
2 74 47 8 46 .57
3 48 38 96 83 .16

I've tried df['diff'] = df.loc[:, 'A': 'D'].max(axis=1) - df.iloc[:df.index.get_loc(df.loc[:, 'A': 'D'].idxmax(axis=1))] / ...

but even that part of the formula returns an error, nevermind including the final division. I'm sure there must be an easier way going about this.

Edit: Additionally, I am also trying to get the difference between the max value and the column that immediately precedes the max value. I know this is a somewhat different question, but I would appreciate any insight. Thank you!

Upvotes: 6

Views: 312

Answers (4)

BENY
BENY

Reputation: 323226

Let us try get the second Max value with mask

Max = df.max(1)
secMax = df.mask(df.eq(Max,0)).max(1)
df['Diff'] = (Max - secMax)/secMax
df
Out[69]: 
    A   B   C   D      Diff
0  17  36  45  54  0.200000
1  18  23  17  17  0.277778
2  74  47   8  46  0.574468
3  48  38  96  83  0.156627

Upvotes: 1

anky
anky

Reputation: 75080

We can also make use of numpy sort and np.diff :

arr = np.sort(df,axis=1)[:,-2:]
df['Diff'] = np.diff(arr,axis=1)[:,0]/arr[:,0]

print(df)

    A   B   C   D      Diff
0  17  36  45  54  0.200000
1  18  23  17  17  0.277778
2  74  47   8  46  0.574468
3  48  38  96  83  0.156627

Upvotes: 2

Chris
Chris

Reputation: 29742

One way using pandas.Series.nlargest with pct_change:

df["Diff"] = df.apply(lambda x: x.nlargest(2).pct_change(-1)[0], axis=1)

Output:

    A   B   C   D      Diff
0  17  36  45  54  0.200000
1  18  23  17  17  0.277778
2  74  47   8  46  0.574468
3  48  38  96  83  0.156627

Upvotes: 4

Quang Hoang
Quang Hoang

Reputation: 150735

One way is to apply a udf:

def get_pct(x):
    xmax2, xmax = x.sort_values().tail(2)
    return (xmax-xmax2)/xmax2

df['Diff'] = df.apply(get_pct, axis=1)

Output:

    A   B   C   D      Diff
0  17  36  45  54  0.200000
1  18  23  17  17  0.277778
2  74  47   8  46  0.574468
3  48  38  96  83  0.156627

Upvotes: 3

Related Questions