Reputation: 1129
I would like to filter out days, which have less then minute 200 observations in them. My data looks as follows:
Time
2009-01-30 09:30:00 85.1100 100.1100
2009-01-30 09:39:00 84.9300 100.0500
2009-01-30 09:40:00 84.9000 100.0000
2009-01-30 09:45:00 84.9100 99.9400
2009-01-30 09:48:00 84.8100 99.9000
2009-01-30 09:55:00 84.7800 100.0000
... ...
2016-02-29 15:58:00 193.7200 24.8300
2016-02-29 15:59:00 193.4800 24.8700
2016-02-29 16:00:00 193.6100 24.8300
2016-03-01 09:30:00 195.2200 24.3099
2016-03-01 09:31:00 195.1000 24.3300
2016-03-01 09:32:00 195.1500 24.3100
2016-03-01 09:33:00 195.1100 24.3800
First column is a DateTimeIndex, as you probably noted this is a minute data and some minutes are missing from the dataset. I would like to avoid resampling on minute data and dealing with NA values, but rather find a way of filtering out days based on index (day has more that > 200 minute observations it stays, <200 minute observations it is dropped out)
Upvotes: 1
Views: 407
Reputation: 210832
assuming that Time
is a column (not an index), try something like as follows:
df.ix[df.groupby(df['Time'].dt.date)['col1'].transform('count') > 200]
where col1
is a column name
if Time
column is an index:
df.ix[df.groupby(df.index.date)['col1'].transform('count') > 200]
UPDATE: starting from Pandas 0.20.1 the .ix indexer is deprecated, in favor of the more strict .iloc and .loc indexers.
So use df.loc[...]
instead of deprecated df.ix[...]
Upvotes: 3