Reputation: 2656
I have a pandas DataFrame with a time series data for five years starting from 2006 where I add a PeriodIndex
that is automatically converted from Period
s made with pd.period_range()
as seen in the code block below.
There, I want to resample()
the four first years and I've used the time series offset aliases mentioned in the docs. When I use freq=1W
it works, but with e.g. a frequency of 2 (or likewise for 3 weeks) I get an error that says
IncompatibleFrequency: Input has different freq=2W-SUN from PeriodIndex(freq=W-SUN)
which is mentioned in the Periods part of the time series docs and it says:
Adding and subtracting integers from periods shifts the period by its own frequency. Arithmetic is not allowed between Period with different freq (span).
Honestly, I'm not sure how this relates to my issue.
The general form of the error is that if my freq=XY
, it gives Input has different freq=XY from PeriodIndex(freq=Y)
, unless X
is 1.
The original dataset is from a csv-file with multiple columns, but in the example I only have a single column A
with the same number of rows.
import pandas as pd
# dummy DataFrame with 87648 rows
df = pd.DataFrame(dict(A=np.random.randint(1, 101, size=87648)))
# Add periods column, set as index
df['time'] = pd.period_range(start='2006-01-01 00:30', freq='30min', end='2011-01-01')
df = df.set_index('time')
Now, if I in e.g. ipython type df.index
I get the following output:
PeriodIndex(['2006-01-01 00:30', '2006-01-01 01:00', '2006-01-01 01:30',
'2006-01-01 02:00', '2006-01-01 02:30', '2006-01-01 03:00',
'2006-01-01 03:30', '2006-01-01 04:00', '2006-01-01 04:30',
'2006-01-01 05:00',
...
'2010-12-31 19:30', '2010-12-31 20:00', '2010-12-31 20:30',
'2010-12-31 21:00', '2010-12-31 21:30', '2010-12-31 22:00',
'2010-12-31 22:30', '2010-12-31 23:00', '2010-12-31 23:30',
'2011-01-01 00:00'],
dtype='period[30T]', name='time', length=87648, freq='30T')
This seems to be along my expectations and match the data in the csv file from where it's loaded:
# This works
df['A'].loc['2006':'2009'].resample('1W').mean().plot()
# This gives error mentioned above
df['A'].loc['2006':'2009'].resample('2W').mean().plot()
Further:
freq=6M
, but it works if I do freq=1M
. (Input has different freq=6M from PeriodIndex(freq=M)
)7D
, which according to my expectations should be the same as 1W
.There are obviously situations where certain periods won't work, but for half-hour data over several years, I'd expect that it would be possible to produce any smaller frequencies like arbitrary number of hours, days, weeks or months.
According to this answer, the following is a better approach:
df['A'].resample('D').interpolate()[::7]
but that gives me an InvalidIndexError: Reindexing only valid with uniquely valued Index objects
. (I assume that there are duplicate index values at hours going from summer to winter during sunlight saving time.)
Also, I'm under the impression pandas aim to do such "heavy lifting" for us, and assume that a deeper understanding would enable users to utilize it without such workarounds.
Although there are several posts on SO on resampling, I've searched for "IncompatibleFrequency" and "Input has different freq", but there seems to be no other posts on it.
I would like to understand why the error is raised, and how to resolve the issue of resampling to arbitrary periods - or at least to understand the limitations.
Upvotes: 2
Views: 3022
Reputation: 33843
This is a bug with plot()
, not resample()
, and has been reported on GitHub (#14763).
As a workaround until the bug is fixed, you can convert your index to a DatetimeIndex
with to_timestamp
prior to plotting:
df.loc['2006':'2009', 'A'].resample('2W').mean().to_timestamp().plot()
Note that you may want to adjust the freq
or how
parameters of to_timestamp
. See the docs for additional details on those parameters.
Upvotes: 1