Pandas, Python. How to filter out days depending on number of observations?

Question

I would like to filter out days, which have less then minute 200 observations in them. My data looks as follows:

    Time                                   
    2009-01-30 09:30:00   85.1100  100.1100
    2009-01-30 09:39:00   84.9300  100.0500
    2009-01-30 09:40:00   84.9000  100.0000
    2009-01-30 09:45:00   84.9100   99.9400
    2009-01-30 09:48:00   84.8100   99.9000
    2009-01-30 09:55:00   84.7800  100.0000
                          ...       ...
    2016-02-29 15:58:00  193.7200   24.8300
    2016-02-29 15:59:00  193.4800   24.8700
    2016-02-29 16:00:00  193.6100   24.8300
    2016-03-01 09:30:00  195.2200   24.3099
    2016-03-01 09:31:00  195.1000   24.3300
    2016-03-01 09:32:00  195.1500   24.3100
    2016-03-01 09:33:00  195.1100   24.3800

First column is a DateTimeIndex, as you probably noted this is a minute data and some minutes are missing from the dataset. I would like to avoid resampling on minute data and dealing with NA values, but rather find a way of filtering out days based on index (day has more that > 200 minute observations it stays, <200 minute observations it is dropped out)

MaxU - stand with Ukraine · Accepted Answer

assuming that Time is a column (not an index), try something like as follows:

df.ix[df.groupby(df['Time'].dt.date)['col1'].transform('count') > 200]

where col1 is a column name

if Time column is an index:

df.ix[df.groupby(df.index.date)['col1'].transform('count') > 200]

UPDATE: starting from Pandas 0.20.1 the .ix indexer is deprecated, in favor of the more strict .iloc and .loc indexers.

So use df.loc[...] instead of deprecated df.ix[...]

Pandas, Python. How to filter out days depending on number of observations?

Answers (1)

Related Questions