clstaudt
clstaudt

Reputation: 22438

pandas.DatetimeIndex frequency is None and can't be set

I created a DatetimeIndex from a "date" column:

sales.index = pd.DatetimeIndex(sales["date"])

Now the index looks as follows:

DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-04', '2003-01-06',
                   '2003-01-07', '2003-01-08', '2003-01-09', '2003-01-10',
                   '2003-01-11', '2003-01-13',
                   ...
                   '2016-07-22', '2016-07-23', '2016-07-24', '2016-07-25',
                   '2016-07-26', '2016-07-27', '2016-07-28', '2016-07-29',
                   '2016-07-30', '2016-07-31'],
                  dtype='datetime64[ns]', name='date', length=4393, freq=None)

As you see, the freq attribute is None. I suspect that errors down the road are caused by the missing freq. However, if I try to set the frequency explicitly:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-148-30857144de81> in <module>()
      1 #### DEBUG
----> 2 sales_train = disentangle(df_train)
      3 sales_holdout = disentangle(df_holdout)
      4 result = sarima_fit_predict(sales_train.loc[5002, 9990]["amount_sold"], sales_holdout.loc[5002, 9990]["amount_sold"])

<ipython-input-147-08b4c4ecdea3> in disentangle(df_train)
      2     # transform sales table to disentangle sales time series
      3     sales = df_train[["date", "store_id", "article_id", "amount_sold"]]
----> 4     sales.index = pd.DatetimeIndex(sales["date"], freq="d")
      5     sales = sales.pivot_table(index=["store_id", "article_id", "date"])
      6     return sales

/usr/local/lib/python3.6/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
     89                 else:
     90                     kwargs[new_arg_name] = new_arg_value
---> 91             return func(*args, **kwargs)
     92         return wrapper
     93     return _deprecate_kwarg

/usr/local/lib/python3.6/site-packages/pandas/core/indexes/datetimes.py in __new__(cls, data, freq, start, end, periods, copy, name, tz, verify_integrity, normalize, closed, ambiguous, dtype, **kwargs)
    399                                          'dates does not conform to passed '
    400                                          'frequency {1}'
--> 401                                          .format(inferred, freq.freqstr))
    402 
    403         if freq_infer:

ValueError: Inferred frequency None from passed dates does not conform to passed frequency D

So apparently a frequency has been inferred, but is stored neither in the freq nor inferred_freq attribute of the DatetimeIndex - both are None. Can someone clear up the confusion?

Upvotes: 34

Views: 61698

Answers (6)

Cord Kaldemeyer
Cord Kaldemeyer

Reputation: 6897

It seems to be an issue with missing values in the index. I have simply re-build the index based on the original index in the frequency I needed:

df.index = pd.date_range(start=df.index[0], end=df.index[-1], freq="h")

Upvotes: 1

Luca Guarro
Luca Guarro

Reputation: 1168

Similar to some of the other answers here, my problem was that my data had missing dates.

Instead of dealing with this issue in Python, I opted to change my SQL query that I was using to source the data. So instead of skipping dates, I wrote the query such that it would fill in missing dates with the value 0.

Upvotes: 0

RobertoDM
RobertoDM

Reputation: 31

It could happen if for examples the dates you are passing aren't sorted.

Look at this example:

example_ts = pd.Series(data=range(10),
                       index=pd.date_range('2020-01-01', '2020-01-10', freq='D'))
example_ts.index = pd.DatetimeIndex(np.hstack([example_ts.index[-1:],
                                               example_ts.index[:-1]]), freq='D')

The previous code goes into your error, because of the non-sequential dates.

example_ts = pd.Series(data=range(10),
                       index=pd.date_range('2020-01-01', '2020-01-10', freq='D'))
example_ts.index = pd.DatetimeIndex(np.hstack([example_ts.index[:-1],
                                               example_ts.index[-1:]]), freq='D')

This one runs correctly, instead.

Upvotes: 3

mrbTT
mrbTT

Reputation: 1409

I'm not sure if earlier versions of python have this, but 3.6 has this simple solution:

# 'b' stands for business days
# 'w' for weekly, 'd' for daily, and you get the idea...
df.index.freq = 'b' 

Upvotes: 8

JohnE
JohnE

Reputation: 30424

It seems to relate to missing dates as 3kt notes. You might be able to "fix" with asfreq('D') as EdChum suggests but that gives you a continuous index with missing data values. It works fine for some some sample data I made up:

df=pd.DataFrame({ 'x':[1,2,4] }, 
   index=pd.to_datetime(['2003-01-02', '2003-01-03', '2003-01-06']) )

df
Out[756]: 
            x
2003-01-02  1
2003-01-03  2
2003-01-06  4

df.index
Out[757]: DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-06'], 
          dtype='datetime64[ns]', freq=None)

Note that freq=None. If you apply asfreq('D'), this changes to freq='D':

df.asfreq('D')
Out[758]: 
              x
2003-01-02  1.0
2003-01-03  2.0
2003-01-04  NaN
2003-01-05  NaN
2003-01-06  4.0

df.asfreq('d').index
Out[759]: 
DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-04', '2003-01-05',
               '2003-01-06'],
              dtype='datetime64[ns]', freq='D')

More generally, and depending on what exactly you are trying to do, you might want to check out the following for other options like reindex & resample: Add missing dates to pandas dataframe

Upvotes: 13

Brad Solomon
Brad Solomon

Reputation: 40888

You have a couple options here:

  • pd.infer_freq
  • pd.tseries.frequencies.to_offset

I suspect that errors down the road are caused by the missing freq.

You are absolutely right. Here's what I use often:

def add_freq(idx, freq=None):
    """Add a frequency attribute to idx, through inference or directly.

    Returns a copy.  If `freq` is None, it is inferred.
    """

    idx = idx.copy()
    if freq is None:
        if idx.freq is None:
            freq = pd.infer_freq(idx)
        else:
            return idx
    idx.freq = pd.tseries.frequencies.to_offset(freq)
    if idx.freq is None:
        raise AttributeError('no discernible frequency found to `idx`.  Specify'
                             ' a frequency string with `freq`.')
    return idx

An example:

idx=pd.to_datetime(['2003-01-02', '2003-01-03', '2003-01-06'])  # freq=None

print(add_freq(idx))  # inferred
DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-06'], dtype='datetime64[ns]', freq='B')

print(add_freq(idx, freq='D'))  # explicit
DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-06'], dtype='datetime64[ns]', freq='D')

Using asfreq will actually reindex (fill) missing dates, so be careful of that if that's not what you're looking for.

The primary function for changing frequencies is the asfreq function. For a DatetimeIndex, this is basically just a thin, but convenient wrapper around reindex which generates a date_range and calls reindex.

Upvotes: 23

Related Questions