Reputation: 91
I am using Python Pandas to groupby a column called "Trace". For each trace, there is a "Value" column with two peaks that I am trying to transfer to a different dataframe. The first problem is that the when I use groupby, it doesn't keep the rest of the data from the row of the value I want to select. For example, if a Pandas dataframe had 6 columns, then I want to preserve all six columns after I use groupby. The second problem is that the two maximums I want are not the two greatest values in the column, but rather "peaks" in the dataset. For example, the attached image shows the two peaks whose values I want. I want the greatest values from each of the two peaks to be exported to a new dataframe with row values from other columns in the previous dataframe.
In the following code, I want to groupby the "Trace" column and pick the two peaks in the "Value" column, while still preserving the "Sample" column after choosing the peaks. The peaks I want to choose would be 52 and 21 for Trace 1 and 61 and 23 for Trace 2.
d = {"Trace": [1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2], "Sample": [1,2,3,4,5,6,7,8,9,10,11,12,1,2,3,4,5,6,7,8,9,10,11,12], "Value": [1,2,3,7,52,33,11,4,2,21,10,3,3,7,15,61,37,16,6,3,11,23,4]}
Any suggestions? I have been using .groupby("Trace") and .nlargest().
Upvotes: 0
Views: 546
Reputation: 9081
The choice of the "peak" confuses me, unless you hardcode the Trace values I don't think you will go far.
On a more sensible stance, for someone searching here, I will post the solution o getting groupby
, nlargest
- getting all the fields while you are at it -
df.groupby(['Trace']).apply(lambda x: x.nlargest(2, columns=['Value']))
Output
Sample Trace Value
Trace
1 3 4 1 12
4 5 1 9
2 13 4 2 15
14 5 2 11
Here, if you are looking for the two "peak" values by Value
column grouped by Trace
, this should be an elegant solution
Upvotes: 1