Python_Learner
Python_Learner

Reputation: 1637

Pandas find sequence or pattern in column

Here's some example data for the problem I'm working on:

index     Quarter    Sales_Growth
0          2001q1    0
1          2002q2    0
2          2002q3    1
3          2002q4    0
4          2003q1    0
5          2004q2    0
6          2004q3    1
7          2004q4    1

The Sales_Growth column tells me if there was indeed sales growth in the quarter or not. 0 = no growth, 1 = growth.

First, I'm trying to return the first Quarter when there were two consecutive quarters of no sales growth.

With the data above this answer would be 2001q1.

Then, I want to return the 2nd quarter of consecutive sales growth that occurs AFTER the initial two quarters of no growth.

The answer to this question would be 2004q4.

I've searched but the closest answer I can find I can't get to work: https://stackoverflow.com/a/26539166/3225420

I am a Pandas beginner.

Upvotes: 8

Views: 12196

Answers (3)

Bill G
Bill G

Reputation: 74

Building on the earlier answers. Q1:

temp = df.Sales_Growth.rolling_apply(window=2, min_periods=2, \
    kwargs={pattern: [0,0]}, func=lambda x, pattern: x == pattern)
print(df[temp==1].head())

In the rolling_apply call, window and min_periods must match the length of the pattern list being passed to the rolling_apply function.

Q2: Same approach, different pattern:

temp = df.Sales_Growth.rolling_apply(window=4, min_periods=4, \
    kwargs={pattern: [0,0,1,1]}, func=lambda x, pattern: x == pattern)
print(df[temp==1].head())

Upvotes: 3

John Zwinck
John Zwinck

Reputation: 249153

You're doing subsequence matching. This is a bit strange, but bear with me:

growth = df.Sales_Growth.astype(str).str.cat()

That gives you:

'00100011'

Then:

growth.index('0011')

Gives you 4 (obviously you'd add a constant 3 to get the index of the last row matched by the pattern).

I feel this approach starts off a bit ugly, but the end result is really usable--you can search for any fixed pattern with no additional coding.

Upvotes: 9

languitar
languitar

Reputation: 6784

For Q1:

temp = df.Sales_Growth + df.Sales_Growth.shift(-1)
df[temp == 0].head(1)

For Q2:

df[(df.Sales_Growth == 1) & (df.Sales_Growth.shift(1) == 1) & (df.Sales_Growth.shift(2) == 0) & (df.Sales_Growth.shift(3) == 0)].head(1)

Upvotes: 9

Related Questions