Reputation: 3667
I have a dataframe df like
Sample Percentage Attribute1 Attribute2
1_A 12.3 xxxx yyyy
1_A 5.0 aaaa bbbb
2_B 10 ccccc ddddd
For each sample, I want to choose the row which has the maximum value in Percentage and make a dataframe df1. If there exist multiple percentages for a sample (such as 1_A), I would like to make a dataframe df2 in which all the rows other than the one that was chosen for df1. In the example above: df1 will have rows of Samples 1_A and 2_B and df2 will have row from Sample 2B. I tried groupby('Percentage')
followed by apply(list)
but was not successful for df1.
Upvotes: 0
Views: 81
Reputation: 323236
IIUC drop_duplicates
df.sort_values('Percentage').drop_duplicates('Sample',keep='last')
Out[1046]:
Sample Percentage Attribute1 Attribute2
2 2_B 10.0 ccccc ddddd
0 1_A 12.3 xxxx yyyy
Upvotes: 2
Reputation: 51335
I would do this by first sorting the dataframe by Percentage
, so the higher percentages are at the bottom. Then, groupby
the Sample
column, and take the last row using tail(1)
to create df1
. Then, to create df2
, just find the rows of df
whose indices are not in df1
:
df1 = df.sort_values('Percentage').groupby('Sample').tail(1)
df2 = df[~df.index.isin(df1.index)]
>>> df1
Sample Percentage Attribute1 Attribute2
2 2_B 10.0 ccccc ddddd
0 1_A 12.3 xxxx yyyy
>>> df2
Sample Percentage Attribute1 Attribute2
1 1_A 5.0 aaaa bbbb
Upvotes: 0