Reputation: 210
I have a dataseries:
df = pd.DataFrame({'Values': [-0.8765, -1, -1.2, 3, 4, 5, -12.0021, 10, 11, 12, -0.982]},
index = [pd.Timestamp('20130101 09:00:00'),
pd.Timestamp('20130101 09:00:02'),
pd.Timestamp('20130101 09:00:03'),
pd.Timestamp('20130101 09:00:05'),
pd.Timestamp('20130101 09:00:06'),
pd.Timestamp('20130101 09:00:07'),
pd.Timestamp('20130101 09:00:08'),
pd.Timestamp('20130101 09:00:09'),
pd.Timestamp('20130101 09:00:10'),
pd.Timestamp('20130101 09:00:11'),
pd.Timestamp('20130101 09:00:12')
])
So, I have to find a pattern into my dataframe. For example, I have this pattern:
pattern = [4,5,-12.0021,10]
So, now I run this algorithm:
print(df.iloc[[int(df.index.get_indexer_for((df[df.Values==i].index))) for i in pattern]])
and It returns to me:
Values
2013-01-01 09:00:06 4.0000
2013-01-01 09:00:07 5.0000
2013-01-01 09:00:08 -12.0021
2013-01-01 09:00:09 10.0000
Ok, cool.
But I also need to find SIMILAR patterns into my dataframe.
So, I have this pattern: pattern = [4,5,-12.0021,10] and for example, If I have this values into my dataframe: [4,5,-12.01,10.1]. The algorithm doesn't return me because It only returns equals but I need to return similar too.
What Have I use?
Upvotes: 1
Views: 969
Reputation: 402922
A nice solution from this question recommends using broadcasting on a numpy
array.
pattern = [4, 5, -12.01, 10.1]
thresh = 0.1
out = df[(np.abs(df.Values.values[:, None] - pattern) <= thresh).any(1)]
out
Values
2013-01-01 09:00:06 4.0000
2013-01-01 09:00:07 5.0000
2013-01-01 09:00:08 -12.0021
2013-01-01 09:00:09 10.0000
Filtering is done based on a manually applied threshold that you can tweak.
Upvotes: 1