Pad
Pad

Reputation: 911

Error converting month number to datetime object

Trying to convert a column in a dataframe from the months 1-420 (35 years of monthly data from 1985 to 2010) to a datetime object.

Sample dataframe:

import pandas as pd
import numpy as np
dates = pd.Series(range(1,421))
df2 = pd.DataFrame(np.random.randn(420,4),index=dates,columns=list('ABCD'))

Convert index to a datetime object:

df2.index = pd.to_datetime(df2.index,unit='M', origin='1981-01-01')

Gives the error:

ValueError: cannot cast unit M

I don't know why it won't cast the unit M, as when I try 'd' instead of 'M' it works, and goes up daily - why won't it go up monthly? I got the units from here.

using 'm' output looks like this:

                       A           B            C            D
1981-01-01 00:01:00 0.672397    0.753926    0.865845    0.711594
1981-01-01 00:02:00 0.786754    0.658421    -0.111609   -1.459447
1981-01-01 00:03:00 0.200273    -1.485525   -1.939203   0.921833
1981-01-01 00:04:00 -1.589668   0.109760    -1.349790   -1.951316
1981-01-01 00:05:00 0.133847    -0.359300   -1.246740   -0.835645
1981-01-01 00:06:00 -0.843962   1.222129    -0.121450   -1.223132
1981-01-01 00:07:00 -0.818932   0.731127    0.984731    -1.028384

which goes up in minutes, I want it to go up in Months like this:

                            A           B            C           D
    1981-01-01 00:00:00 0.672397    0.753926    0.865845    0.711594
    1981-02-01 00:00:00 0.786754    0.658421    -0.111609   -1.459447
    1981-03-01 00:00:00 0.200273    -1.485525   -1.939203   0.921833

Upvotes: 0

Views: 65

Answers (1)

harpan
harpan

Reputation: 8631

You should use date_range:

df2.index = pd.date_range('1981/1/1', periods=len(df2), freq='MS')

Output:

                A           B           C            D
1981-01-01  -0.761933   0.726808    0.589712    -1.170934
1981-02-01  0.030521    -0.892427   -1.366809   -1.515724
1981-03-01  -0.282887   1.068047    0.244493    -0.247356

Have a look at offset alias for more information.

EDIT: As OP said, the 425 days are repeating over 200,000 rows. Below code would provide repeated indices.

daterange = pd.date_range('1981/1/1', periods=420, freq='MS') 

Then expand it to fit your dataframe by repeating it.

df2.index = list(daterange) * math.floor(len(df2)/len(list(daterange))) + list(daterange)[0:math.floor(len(df2)%len(list(daterange)))]

Upvotes: 2

Related Questions