Reputation: 12117
Let's say I have the following dataframe:
df = pd.DataFrame({'a': [10, 20, 30, 40, 50], 'b': [0, 10, 40, 45, 50]}, columns = ['a', 'b'])
I would like to make a list of indices where:
a [i - 1] < b[i] and a[i] >= b[i]
in order to detect when a value, in a timeseries, is crossing another one
is there a Pandas idiomatic way to achieve this without iterating through all the elements?
I tried to create a new column with flags to indicate a crossing by doing this:
df['t'] = (df['a'].shift(1).values < df['b'].values and di['a'].values >= df['b']).astype(bool)
but that won't compile. I'm not sure how to approach this problem, short of doing a loop through all the elements.
Upvotes: 4
Views: 1290
Reputation: 1205
A similar solution would be to use a condition in combination with Series.diff: This will basically test where you condition changed.
I like this because I feel it is a little easier to read, it also might be a little faster.
NOTE: you need to convert the type of the condition to int otherwise you get True for changes up and downwards (pandas .diff uses xor for bool instead of sub, see link above)
Find the positions where df.a crossed df.b upwards:
mask = df.a.gt(df.b).astype('i1').diff().gt(0)
or if you find this easier to read:
mask = ((df.a > df.b).astype('i1').diff() > 0)
# to get the index
idx = df.loc[mask].index
Upvotes: 0
Reputation: 42916
You can use the Series.shift
with Series.lt
which is "less than", same as <
and Series.ge
which is "greater than or equal" and is same as >=
:
mask = df['a'].shift().lt(df['b']) & df['a'].ge(df['b'])
# same as (df['A'].shift() < df['b']) & (df['a'] >= df['b'])
0 False
1 False
2 False
3 False
4 True
dtype: bool
Notice, we don't have to specify astype(bool)
, pandas works with boolean indexing
and returns booleans
when defining conditions.
To get the indices
of the rows with True
, use:
idx = df[mask].index.tolist()
print(idx)
[4]
Upvotes: 4