Jared
Jared

Reputation: 3892

Find value greater than level - Python Pandas

In a time series (ordered tuples), what's the most efficient way to find the first time a criterion is met?

In particular, what's the most efficient way to determine when a value goes over 100 for the value of a column in a pandas data frame?

I was hoping for a clever vectorized solution, and not having to use df.iterrows().

For example, for price or count data, when a value exceeds 100. I.e. df['col'] > 100.

              price
date 
2005-01-01     98
2005-01-02     99
2005-01-03     100
2005-01-04     99
2005-01-05     98
2005-01-06     100
2005-01-07     100
2005-01-08     98

but for potentially very large series. Is it better to iterate (slow) or is there a vectorized solution?

A df.iterrows() solution could be:

for row, ind in df.iterrows():
    if row['col'] > value_to_check:
        breakpoint = row['value_to_record'].loc[ind]
        return breakpoint
return None

But my question is more about efficiency (potentially, a vectorized solution that will scale well).

Upvotes: 17

Views: 53133

Answers (3)

Matoran
Matoran

Reputation: 154

numpy argmax returns when it finds the first True value. Numpy is a pandas dependency and is then available for you directly :).

df.iloc[np.argmax(df['col'] > 100)] 

Upvotes: 2

user3304496
user3304496

Reputation: 121

This will return the index value of the first occurrence of 100 in the series:

 index_value = (df['col'] - 100).apply(abs).idxmin()

If there is no value exactly 100, it should return the index of the closest value.

Upvotes: 5

Merlin
Merlin

Reputation: 25669

Try this: "> 99"

df[df['price'].gt(99)].index[0]

returns "2", the second index row.

all row indexes greater than 99

df[df['price'].gt(99)].index
Int64Index([2, 5, 6], dtype='int64')

Upvotes: 29

Related Questions