Reputation: 163
I have a dataframe with a time series in one single column. The data looks like this chart
I would like to create a mask that is TRUE each time that the data is equal or lower than -0.20. It should also be TRUE before reaching -0.20 while negative. It should also be true after reaching -0.20 while negative. This version of the chart
is my manual attempt to show (in red) the values where the mask would be TRUE. I started creating the mask but I could only make it equal to TRUE while the data is less than -0.20 mask = (df['data'] < -0.2)
. I couldn't do any better, does anybody know how to achieve my goal?
Upvotes: 1
Views: 405
Reputation: 656
Group by consecutive values of same sign, and then check if the minimum of such a group is less than the defined treshold.
First, we want to separate negative from positive values.
negative_mask = (df['data']<0)
We then can create classes (ordered with integers) for each consecutive positive or negative series. The class increases by one each time the data changes sign.
consecutives = negative_mask.diff().ne(0).cumsum()
We then select only the data where the minimum of the group of consecutive elements is less than 0.2.
df.groupby(consecutives).filter(lambda df : df[0].min() < -0.2)
We can try our example with random data:
import numpy as np
import pandas as pd
np.random.seed(42)
data = np.random.randint(-300, 300, size=1000)/1000
df = pd.DataFrame(data, columns=["data"])
data
2 -0.030
3 -0.194
4 -0.229
5 -0.280
6 -0.179
... ...
991 -0.293
995 -0.247
996 -0.062
997 -0.072
999 -0.250
363 rows × 1 columns
Upvotes: 3
Reputation: 642
One approach could be to group segments that are entirely below zero, and then for each group verify whether or not there any values below -0.2
.
See below for a full reproducible example script:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
np.random.seed(167)
df = pd.DataFrame(
{"y": np.cumsum([np.random.uniform(-0.01, 0.01) for _ in range(10 ** 5)])}
)
plt.plot(df)
gt_zero = df["y"] < 0
regions = (gt_zero != gt_zero.shift()).cumsum()
# here's your interesting DataFrame with the specified mask
df_interesting = df.groupby(regions).filter(lambda s: s.min() < -0.2)
# plot individual regions
for i, grp in df.groupby(regions):
if grp["y"].min() < -0.2:
plt.plot(grp, color="tab:red", linewidth=5, alpha=0.6)
plt.axhline(0, linestyle="--", color="tab:gray")
plt.axhline(-0.2, linestyle="--", color="tab:gray")
plt.show()
Upvotes: 5