brunoerg
brunoerg

Reputation: 210

Finding the closest matching numbers in dataframe using Pandas/Python

I have a dataseries:

df = pd.DataFrame({'Values': [-0.8765, -1, -1.2, 3, 4, 5, -12.0021, 10, 11, 12, -0.982]},
              index = [pd.Timestamp('20130101 09:00:00'),
                       pd.Timestamp('20130101 09:00:02'),
                       pd.Timestamp('20130101 09:00:03'),
                       pd.Timestamp('20130101 09:00:05'),
                       pd.Timestamp('20130101 09:00:06'),
                       pd.Timestamp('20130101 09:00:07'),
                       pd.Timestamp('20130101 09:00:08'),
                       pd.Timestamp('20130101 09:00:09'),
                       pd.Timestamp('20130101 09:00:10'),
                       pd.Timestamp('20130101 09:00:11'),
                       pd.Timestamp('20130101 09:00:12')
                       ])

So, I have to find a pattern into my dataframe. For example, I have this pattern:

pattern = [4,5,-12.0021,10] 

So, now I run this algorithm:

print(df.iloc[[int(df.index.get_indexer_for((df[df.Values==i].index))) for i in pattern]])

and It returns to me:

                      Values
2013-01-01 09:00:06   4.0000
2013-01-01 09:00:07   5.0000
2013-01-01 09:00:08 -12.0021
2013-01-01 09:00:09  10.0000

Ok, cool.

But I also need to find SIMILAR patterns into my dataframe.

So, I have this pattern: pattern = [4,5,-12.0021,10] and for example, If I have this values into my dataframe: [4,5,-12.01,10.1]. The algorithm doesn't return me because It only returns equals but I need to return similar too.

What Have I use?

Upvotes: 1

Views: 969

Answers (1)

cs95
cs95

Reputation: 402922

A nice solution from this question recommends using broadcasting on a numpy array.

pattern = [4, 5, -12.01, 10.1]
thresh = 0.1

out = df[(np.abs(df.Values.values[:, None] - pattern) <= thresh).any(1)]
out
                      Values
2013-01-01 09:00:06   4.0000
2013-01-01 09:00:07   5.0000
2013-01-01 09:00:08 -12.0021
2013-01-01 09:00:09  10.0000

Filtering is done based on a manually applied threshold that you can tweak.

Upvotes: 1

Related Questions