Reputation: 95
I have a dataframe that look like this:
NAME MONTH TIME
Paul Jan 3
Paul Sept 1
Joe Jan 3
Joe Aug 3
And I transformed it to a df like this one, using pivot:
NAME JAN SEPT AUG
Paul 3 1 0
Joe 3 0 3
Now I'm creating a new column with the biggest value for every row, and it looks like this:
NAME JAN SEPT AUG 1_MAX
Paul 3 1 0 3
Joe 3 0 3 3
And then, I'm assigning 0 in a temporary dataframe to the old biggest value, to now get the second biggest value, and look like this:
NAME JAN SEPT AUG 1_MAX 2_MAX
Paul 3 1 0 3 1
Joe 3 0 3 3 3
But because Joe's have 2 times 3, in Jan and August, when I assign 0 to the biggest one, who should just be 3 for JAN that is the first time the biggest value appear, it changes to 0 all max instances. It becomes like this, which is not what I want:
NAME JAN SEPT AUG 1_MAX 2_MAX
Paul 3 1 0 3 1
Joe 3 0 3 3 0
I'm using:
f_temp1 = df_temp1.apply(lambda x: x.replace(max(x), 0), axis = 1)
to change the biggest value to zero, but this replaces all the biggest values, I would like to replace the maximum value of the row just in the first time it appears.
I need a generic solution because I'm working in a big dataframe.
Upvotes: 5
Views: 384
Reputation: 1393
This is another efficient way to achieve what you want, especially in a big dataframe:
import pandas as pd
import numpy as np
# Assuming this is your df
df = pd.DataFrame({'NAME': ['Paul','Joe'],'JAN': [3,3],
'SEPT': [1,0],'AUG':[0,3]})
# Sort using numpy array, don't wanna include the Name column,
# create 2 new columns with the first 2 max values
df[['2_MAX','1_MAX']] = pd.DataFrame(np.sort(np.delete(df.values, 0, 1))
[:,-2:], columns=['2_MAX','1_MAX'])
Output:
NAME JAN SEPT AUG 2_MAX 1_MAX
0 Paul 3 1 0 1 3
1 Joe 3 0 3 3 3
Upvotes: 0
Reputation: 59529
Use numpy to sort
the underlying array (assuming 'Name'
is in the index) and join back the max values.
import pandas as pd
import numpy as np
N = 2
pd.concat([df, pd.DataFrame(np.sort(df.to_numpy(), axis=1)[:, -N:],
index=df.index,
columns=[f'{i}_MAX' for i in range(N, 0, -1)])],
axis=1)
JAN SEPT AUG 2_MAX 1_MAX
NAME
Paul 3 1 0 1 3
Joe 3 0 3 3 3
Upvotes: 3
Reputation: 30920
Use:
df[['1_MAX','2_MAX']]=(df.loc[:,'JAN':]
.apply(lambda x: pd.Series(np.sort(np.unique(x))[-2:]),
axis=1)
.loc[:,[1,0]])
print(df)
NAME JAN SEPT AUG 1_MAX 2_MAX
0 Paul 3 1 0 3 1
1 Joe 3 0 3 3 0
Initial df
NAME JAN SEPT AUG
0 Paul 3 1 0
1 Joe 3 0 3
Upvotes: 2