kingjames23
kingjames23

Reputation: 95

how to replace just first instance of max value in dataframe pandas?

I have a dataframe that look like this:

NAME   MONTH  TIME
Paul    Jan     3
Paul    Sept    1
Joe      Jan    3
Joe     Aug     3

And I transformed it to a df like this one, using pivot:

NAME JAN SEPT AUG 
Paul  3    1   0
Joe   3    0   3

Now I'm creating a new column with the biggest value for every row, and it looks like this:

NAME JAN SEPT AUG 1_MAX
Paul  3    1   0    3
Joe   3    0   3    3

And then, I'm assigning 0 in a temporary dataframe to the old biggest value, to now get the second biggest value, and look like this:

NAME JAN SEPT AUG 1_MAX 2_MAX
Paul  3    1   0    3     1
Joe   3    0   3    3     3

But because Joe's have 2 times 3, in Jan and August, when I assign 0 to the biggest one, who should just be 3 for JAN that is the first time the biggest value appear, it changes to 0 all max instances. It becomes like this, which is not what I want:

NAME JAN SEPT AUG 1_MAX 2_MAX
Paul  3    1   0    3     1
Joe   3    0   3    3     0

I'm using:

f_temp1 = df_temp1.apply(lambda x: x.replace(max(x), 0), axis = 1)

to change the biggest value to zero, but this replaces all the biggest values, I would like to replace the maximum value of the row just in the first time it appears.

I need a generic solution because I'm working in a big dataframe.

Upvotes: 5

Views: 384

Answers (3)

Dimitris Thomas
Dimitris Thomas

Reputation: 1393

This is another efficient way to achieve what you want, especially in a big dataframe:

import pandas as pd
import numpy as np

# Assuming this is your df
df = pd.DataFrame({'NAME': ['Paul','Joe'],'JAN': [3,3],
                   'SEPT': [1,0],'AUG':[0,3]})

# Sort using numpy array, don't wanna include the Name column,
# create 2 new columns with the first 2 max values
df[['2_MAX','1_MAX']] = pd.DataFrame(np.sort(np.delete(df.values, 0, 1))
                                     [:,-2:], columns=['2_MAX','1_MAX'])

Output:

    NAME JAN SEPT AUG 2_MAX 1_MAX
0   Paul   3    1   0     1     3
1    Joe   3    0   3     3     3

Upvotes: 0

ALollz
ALollz

Reputation: 59529

Use numpy to sort the underlying array (assuming 'Name' is in the index) and join back the max values.

import pandas as pd
import numpy as np
N = 2

pd.concat([df, pd.DataFrame(np.sort(df.to_numpy(), axis=1)[:, -N:],
                            index=df.index,
                            columns=[f'{i}_MAX' for i in range(N, 0, -1)])],
           axis=1)

      JAN  SEPT  AUG  2_MAX  1_MAX
NAME                              
Paul    3     1    0      1      3
Joe     3     0    3      3      3

Upvotes: 3

ansev
ansev

Reputation: 30920

Use:

df[['1_MAX','2_MAX']]=(df.loc[:,'JAN':]
                         .apply(lambda x: pd.Series(np.sort(np.unique(x))[-2:]),
                                axis=1)
                         .loc[:,[1,0]])
print(df)
   NAME  JAN  SEPT  AUG  1_MAX  2_MAX
0  Paul    3     1    0      3      1
1   Joe    3     0    3      3      0

Initial df

   NAME  JAN  SEPT  AUG
0  Paul    3     1    0
1   Joe    3     0    3

Upvotes: 2

Related Questions