Jimbo_007
Jimbo_007

Reputation: 13

Convert day numbers into dates in python

How do you convert day numbers (1,2,3...728,729,730) to dates in python? I can assign an arbitrary year to start the date count as the year doesn't matter to me.

I am working on learning time series analysis, ARIMA, SARIMA, etc using python. I have a CSV dataset with two columns, 'Day' and 'Revenue'. The Day column contains numbers 1-731, Revenue contains numbers 0-18.154... I have had success building the model, running statistical tests, building visualizations, etc. But when it comes to forecasting using prophet I am hitting a wall.

Here are what I feel are the relevant parts of the code related to the question:

# Loading the CSV with pandas. This code converts the "Day" column into the index.
df = read_csv("telco_time_series.csv", index_col=0, parse_dates=True)
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 731 entries, 1 to 731
Data columns (total 1 columns):
#   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
0   Revenue  731 non-null    float64
dtypes: float64(1)
memory usage: 11.4 KB
df.head()
Revenue
Day 
1   0.000000
2   0.000793
3   0.825542
4   0.320332
5   1.082554

# Instantiate the model
model = ARIMA(df, order=(4,1,0))

# Fit the model
results = model.fit()

# Print summary
print(results.summary())

# line plot of residuals
residuals = (results.resid)
residuals.plot()
plt.show()

# density plot of residuals
residuals.plot(kind='kde')
plt.show()

# summary stats of residuals
print(residuals.describe())

SARIMAX Results                                
==============================================================================
Dep. Variable:                Revenue   No. Observations:                  731
Model:                 ARIMA(4, 1, 0)   Log Likelihood                -489.105
Date:                Tue, 03 Aug 2021   AIC                            988.210
Time:                        07:29:55   BIC                           1011.175
Sample:                             0   HQIC                           997.070
                            - 731                                         
Covariance Type:                  opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
ar.L1         -0.4642      0.037    -12.460      0.000      -0.537      -0.391
ar.L2          0.0295      0.040      0.746      0.456      -0.048       0.107
ar.L3          0.0618      0.041      1.509      0.131      -0.018       0.142
ar.L4          0.0366      0.039      0.946      0.344      -0.039       0.112
sigma2         0.2235      0.013     17.629      0.000       0.199       0.248
===================================================================================
Ljung-Box (L1) (Q):                   0.01   Jarque-Bera (JB):                 2.52
Prob(Q):                              0.90   Prob(JB):                         0.28
Heteroskedasticity (H):               1.01   Skew:                            -0.05
Prob(H) (two-sided):                  0.91   Kurtosis:                         2.73
===================================================================================

df.columns=['ds','y']
ValueError: Length mismatch: Expected axis has 1 elements, new values have 2 elements

m = Prophet()
m.fit(df)

ValueError: Dataframe must have columns "ds" and "y" with the dates and values 
respectively.

I've had success with the forecast using prophet if I fill the values in the CSV with dates, but I would like to convert the Day numbers within the code using pandas.

Any ideas?

Upvotes: 1

Views: 1064

Answers (1)

Daweo
Daweo

Reputation: 36520

I can assign an arbitrary year to start the date count as the year doesn't matter to me(...)Any ideas?

You might harness datetime.timedelta for this task. Select any date you wish as day 0 and then add datetime.timedelta(days=x) where x is your day number, for example:

import datetime
day0 = datetime.date(2000,1,1)
day120 = day0 + datetime.timedelta(days=120)
print(day120)

output

2000-04-30

encase in function and .apply if you have pandas.DataFrame like so

import datetime
import pandas as pd
def convert_to_date(x):
    return datetime.date(2000,1,1)+datetime.timedelta(days=x)
df = pd.DataFrame({'day_n':[1,2,3,4,5]})
df['day_date'] = df['day_n'].apply(convert_to_date)
print(df)

output

   day_n    day_date
0      1  2000-01-02
1      2  2000-01-03
2      3  2000-01-04
3      4  2000-01-05
4      5  2000-01-06

Upvotes: 1

Related Questions