Colton Medler
Colton Medler

Reputation: 91

Python Pandas- Find the first instance of a value exceeding a threshold

I am trying to find the first instance of a value exceeding a threshold based on another Python Pandas data frame column. In the code below, the "Trace" column has the same number for multiple rows. I want to find the first instance where the "Value" column exceeds 3. Then, I want to take the rest of the information from that row and export it to a new Pandas data frame (like in the second example). Any ideas?

d = {"Trace": [1,1,1,1,2,2,2,2], "Date": [1,2,3,4,1,2,3,4], "Value": [1.5,1.9,3.1,5.5,1.1,3.6,1.9,6.2]}

df = pd.DataFrame(data=d)

enter image description here

Upvotes: 3

Views: 8208

Answers (3)

BENY
BENY

Reputation: 323226

By using idxmax

df.loc[(df.Value>3).groupby(df.Trace).idxmax()]
Out[602]: 
   Date  Trace  Value
2     3      1    3.1
5     2      2    3.6

Upvotes: 6

Brad Solomon
Brad Solomon

Reputation: 40878

You can also achieve this with .groupby().head(1):

>>> df.loc[df.Value > 3].groupby('Trace').head(1)
   Date  Trace  Value
2     3      1    3.1
5     2      2    3.6

This finds the first occurrence (given whatever order your DataFrame is currently in) of the row with Value > 3 for each Trace.

Upvotes: 2

ImportanceOfBeingErnest
ImportanceOfBeingErnest

Reputation: 339122

One option is to first filter by the condition (Value > 3) and then only take the first entry for each Trace. The following assumes that Trace is numeric.

import numpy as np
import pandas as pd

df = pd.DataFrame({"Trace" : np.repeat([1,2],4),
                   "Value" : [1.5, 1.9, 3.1, 5.5, 1.1, 3.6, 1.9, 6.2]})

df = df.loc[df.Value > 3.0]
df = df.loc[np.diff(np.concatenate(([df.Trace.values[0]-1],df.Trace.values))) > 0]
print(df)

This prints

    Trace  Value
 2      1    3.1
 5      2    3.6

Upvotes: 0

Related Questions