Find pattern with defined start and end, but with unknown length

Question

The essence:

This is a follow-up question to Find particular pattern in a pandas dataframe, but now I'm not looking for a fixed pattern. So how can you define a particular start and end to a pattern / time period and subset a pandas dataframe according to that?

The details:

Let's say that you have this dataframe:

            ColA  ColB
Dates                 
2017-07-07   103    92
2017-07-08    92    96
2017-07-09   107   109
2017-07-10   100    91
2017-07-11    90   107
2017-07-12   105    99
2017-07-13    90   104
2017-07-14    90   105
2017-07-15   109   104
2017-07-16    94    90

And let's say that the pattern you are looking for starts with [107, 100] in ColA and ends with [90, 109] in the same column (ColB is just there to illustrate that it's a dataframe and not a series). How can you subset the dataframe while not knowing how many observations that lie between them?

Desired output:

            ColA  ColB
Dates                 
2017-07-09   107   109
2017-07-10   100    91
2017-07-11    90   107
2017-07-12   105    99
2017-07-13    90   104
2017-07-14    90   105
2017-07-15   109   104

I know how to do this in a rudimentary way using nested for loops, but I'm hoping some of you have got a more elegant solution. Thank you for any suggestions!

jpp · Accepted Answer

For a couple of values, you can use pd.Series.shift. You will need extra logic to either account for idx2 occurring before idx1, and to generalize for arbitrary adjacent values.

idx1 = (df['ColA'].eq(107) & df['ColA'].shift(-1).eq(100)).idxmax()
idx2 = (df['ColA'].shift().eq(90) & df['ColA'].eq(109)).idxmax()

print(df.loc[idx1: idx2])

            ColA  ColB
Dates                 
2017-07-09   107   109
2017-07-10   100    91
2017-07-11    90   107
2017-07-12   105    99
2017-07-13    90   104
2017-07-14    90   105
2017-07-15   109   104

Find pattern with defined start and end, but with unknown length

Answers (1)

Related Questions