Reputation: 61134
The essence:
This is a follow-up question to Find particular pattern in a pandas dataframe, but now I'm not looking for a fixed pattern. So how can you define a particular start and end to a pattern / time period and subset a pandas dataframe according to that?
The details:
Let's say that you have this dataframe:
ColA ColB
Dates
2017-07-07 103 92
2017-07-08 92 96
2017-07-09 107 109
2017-07-10 100 91
2017-07-11 90 107
2017-07-12 105 99
2017-07-13 90 104
2017-07-14 90 105
2017-07-15 109 104
2017-07-16 94 90
And let's say that the pattern you are looking for starts with [107, 100]
in ColA
and ends with [90, 109]
in the same column (ColB is just there to illustrate that it's a dataframe and not a series). How can you subset the dataframe while not knowing how many observations that lie between them?
Desired output:
ColA ColB
Dates
2017-07-09 107 109
2017-07-10 100 91
2017-07-11 90 107
2017-07-12 105 99
2017-07-13 90 104
2017-07-14 90 105
2017-07-15 109 104
I know how to do this in a rudimentary way using nested for loops, but I'm hoping some of you have got a more elegant solution. Thank you for any suggestions!
Upvotes: 1
Views: 76
Reputation: 164773
For a couple of values, you can use pd.Series.shift
. You will need extra logic to either account for idx2
occurring before idx1
, and to generalize for arbitrary adjacent values.
idx1 = (df['ColA'].eq(107) & df['ColA'].shift(-1).eq(100)).idxmax()
idx2 = (df['ColA'].shift().eq(90) & df['ColA'].eq(109)).idxmax()
print(df.loc[idx1: idx2])
ColA ColB
Dates
2017-07-09 107 109
2017-07-10 100 91
2017-07-11 90 107
2017-07-12 105 99
2017-07-13 90 104
2017-07-14 90 105
2017-07-15 109 104
Upvotes: 1