Reputation: 79
I am working with timeseries data in Pandas (timestamp used as index). I am doing some filtering on my dataset and end up with a dataframe that mostly contains consecutive observations (one-minute data). However, there are also time intervals with only one or a few minutes of obervations. These I would like to exclude. How can I get hold of those short intervals using sth like:
df = df.drop(df[<some boolean condition>].index)
timestamp value
2018-01-08 06:13:00 143
2018-01-08 06:14:00 324
2018-01-08 06:15:00 324
2018-01-08 06:16:00 324
2018-01-08 06:17:00 324
2018-01-08 06:20:00 324(remove)
2018-01-08 06:35:00 324
2018-01-08 06:36:00 324
2018-01-08 06:37:00 324
2018-01-08 06:38:00 324
2018-01-08 06:39:00 324
2018-01-08 06:40:00 324
Upvotes: 1
Views: 625
Reputation: 863741
Use:
#convert index to Series
s = df.index.to_series()
#test if 1 Minute difference, then cumulative sum
a = s.diff().ne(pd.Timedelta(1, unit='Min')).cumsum()
#filter if counts of cumulative value greater like N, e.g. 3
N = 3
df = df[a.map(a.value_counts()).gt(N)]
print (df)
value
timestamp
2018-01-08 06:13:00 143
2018-01-08 06:14:00 324
2018-01-08 06:15:00 324
2018-01-08 06:16:00 324
2018-01-08 06:17:00 324
2018-01-08 06:35:00 324
2018-01-08 06:36:00 324
2018-01-08 06:37:00 324
2018-01-08 06:38:00 324
2018-01-08 06:39:00 324
2018-01-08 06:40:00 324
Upvotes: 1