Thomas Fauskanger
Thomas Fauskanger

Reputation: 2656

Why is pandas time series resample raising IncompatibleFrequency error?

The problem

I have a pandas DataFrame with a time series data for five years starting from 2006 where I add a PeriodIndex that is automatically converted from Periods made with pd.period_range() as seen in the code block below.

There, I want to resample() the four first years and I've used the time series offset aliases mentioned in the docs. When I use freq=1W it works, but with e.g. a frequency of 2 (or likewise for 3 weeks) I get an error that says

IncompatibleFrequency: Input has different freq=2W-SUN from PeriodIndex(freq=W-SUN)

which is mentioned in the Periods part of the time series docs and it says:

Adding and subtracting integers from periods shifts the period by its own frequency. Arithmetic is not allowed between Period with different freq (span).

Honestly, I'm not sure how this relates to my issue.

The general form of the error is that if my freq=XY, it gives Input has different freq=XY from PeriodIndex(freq=Y), unless X is 1.

The data

The original dataset is from a csv-file with multiple columns, but in the example I only have a single column A with the same number of rows.

import pandas as pd
# dummy DataFrame with 87648 rows
df = pd.DataFrame(dict(A=np.random.randint(1, 101, size=87648)))
# Add periods column, set as index
df['time'] = pd.period_range(start='2006-01-01 00:30', freq='30min', end='2011-01-01')
df = df.set_index('time')

Now, if I in e.g. ipython type df.index I get the following output:

PeriodIndex(['2006-01-01 00:30', '2006-01-01 01:00', '2006-01-01 01:30',
             '2006-01-01 02:00', '2006-01-01 02:30', '2006-01-01 03:00',
             '2006-01-01 03:30', '2006-01-01 04:00', '2006-01-01 04:30',
             '2006-01-01 05:00',
             ...
             '2010-12-31 19:30', '2010-12-31 20:00', '2010-12-31 20:30',
             '2010-12-31 21:00', '2010-12-31 21:30', '2010-12-31 22:00',
             '2010-12-31 22:30', '2010-12-31 23:00', '2010-12-31 23:30',
             '2011-01-01 00:00'],
            dtype='period[30T]', name='time', length=87648, freq='30T')

This seems to be along my expectations and match the data in the csv file from where it's loaded:

The attempt(s)

# This works
df['A'].loc['2006':'2009'].resample('1W').mean().plot()

# This gives error mentioned above
df['A'].loc['2006':'2009'].resample('2W').mean().plot()

Further:

Additional thoughts

There are obviously situations where certain periods won't work, but for half-hour data over several years, I'd expect that it would be possible to produce any smaller frequencies like arbitrary number of hours, days, weeks or months.

According to this answer, the following is a better approach:

df['A'].resample('D').interpolate()[::7]

but that gives me an InvalidIndexError: Reindexing only valid with uniquely valued Index objects. (I assume that there are duplicate index values at hours going from summer to winter during sunlight saving time.)

Also, I'm under the impression pandas aim to do such "heavy lifting" for us, and assume that a deeper understanding would enable users to utilize it without such workarounds.

Although there are several posts on SO on resampling, I've searched for "IncompatibleFrequency" and "Input has different freq", but there seems to be no other posts on it.

The question

I would like to understand why the error is raised, and how to resolve the issue of resampling to arbitrary periods - or at least to understand the limitations.

Upvotes: 2

Views: 3022

Answers (1)

root
root

Reputation: 33843

This is a bug with plot(), not resample(), and has been reported on GitHub (#14763).

As a workaround until the bug is fixed, you can convert your index to a DatetimeIndex with to_timestamp prior to plotting:

df.loc['2006':'2009', 'A'].resample('2W').mean().to_timestamp().plot()

Note that you may want to adjust the freq or how parameters of to_timestamp. See the docs for additional details on those parameters.

Upvotes: 1

Related Questions