Colton Medler
Colton Medler

Reputation: 91

Python Pandas- Groupby column to obtain two peaks in the Pandas dataframe column

I am using Python Pandas to groupby a column called "Trace". For each trace, there is a "Value" column with two peaks that I am trying to transfer to a different dataframe. The first problem is that the when I use groupby, it doesn't keep the rest of the data from the row of the value I want to select. For example, if a Pandas dataframe had 6 columns, then I want to preserve all six columns after I use groupby. The second problem is that the two maximums I want are not the two greatest values in the column, but rather "peaks" in the dataset. For example, the attached image shows the two peaks whose values I want. I want the greatest values from each of the two peaks to be exported to a new dataframe with row values from other columns in the previous dataframe.

In the following code, I want to groupby the "Trace" column and pick the two peaks in the "Value" column, while still preserving the "Sample" column after choosing the peaks. The peaks I want to choose would be 52 and 21 for Trace 1 and 61 and 23 for Trace 2.

d = {"Trace": [1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2], "Sample": [1,2,3,4,5,6,7,8,9,10,11,12,1,2,3,4,5,6,7,8,9,10,11,12], "Value": [1,2,3,7,52,33,11,4,2,21,10,3,3,7,15,61,37,16,6,3,11,23,4]}

Any suggestions? I have been using .groupby("Trace") and .nlargest().PEaks

Upvotes: 0

Views: 546

Answers (1)

Vivek Kalyanarangan
Vivek Kalyanarangan

Reputation: 9081

The choice of the "peak" confuses me, unless you hardcode the Trace values I don't think you will go far.

On a more sensible stance, for someone searching here, I will post the solution o getting groupby, nlargest - getting all the fields while you are at it -

df.groupby(['Trace']).apply(lambda x: x.nlargest(2, columns=['Value']))

Output

          Sample  Trace  Value
Trace                         
1     3        4      1     12
      4        5      1      9
2     13       4      2     15
      14       5      2     11

Here, if you are looking for the two "peak" values by Value column grouped by Trace, this should be an elegant solution

Upvotes: 1

Related Questions