Thomas
Thomas

Reputation: 12117

detecting value crossing between columns in Pandas

Let's say I have the following dataframe:

df = pd.DataFrame({'a': [10, 20, 30, 40, 50], 'b': [0, 10, 40, 45, 50]}, columns = ['a', 'b'])

I would like to make a list of indices where:

a [i - 1] < b[i] and a[i] >= b[i]

in order to detect when a value, in a timeseries, is crossing another one

is there a Pandas idiomatic way to achieve this without iterating through all the elements?

I tried to create a new column with flags to indicate a crossing by doing this:

df['t'] = (df['a'].shift(1).values < df['b'].values and di['a'].values >= df['b']).astype(bool)

but that won't compile. I'm not sure how to approach this problem, short of doing a loop through all the elements.

Upvotes: 4

Views: 1290

Answers (2)

n4321d
n4321d

Reputation: 1205

A similar solution would be to use a condition in combination with Series.diff: This will basically test where you condition changed.
I like this because I feel it is a little easier to read, it also might be a little faster.

NOTE: you need to convert the type of the condition to int otherwise you get True for changes up and downwards (pandas .diff uses xor for bool instead of sub, see link above)

Find the positions where df.a crossed df.b upwards:

mask = df.a.gt(df.b).astype('i1').diff().gt(0)

or if you find this easier to read:

mask = ((df.a > df.b).astype('i1').diff() > 0)

output:
enter image description here

# to get the index
idx = df.loc[mask].index

output: enter image description here

Upvotes: 0

Erfan
Erfan

Reputation: 42916

You can use the Series.shift with Series.lt which is "less than", same as < and Series.ge which is "greater than or equal" and is same as >=:

mask = df['a'].shift().lt(df['b']) & df['a'].ge(df['b'])
# same as (df['A'].shift() < df['b']) & (df['a'] >= df['b'])

0    False
1    False
2    False
3    False
4     True
dtype: bool

Notice, we don't have to specify astype(bool), pandas works with boolean indexing and returns booleans when defining conditions.


To get the indices of the rows with True, use:

idx = df[mask].index.tolist()

print(idx)
[4]

Upvotes: 4

Related Questions