sretko
sretko

Reputation: 640

Pandas Frequency Conversion

I'm trying to find if is possible to use data.asfreq(MonthEnd()) with no date_range created data. What I'm trying to achive. I run csv query with the following code:

import numpy as np
import pandas as pd
data = pd.read_csv("https://www.quandl.com/api/v3/datasets/FRED/GDPC1.csv?api_key=", parse_dates=True)
data.columns = ["period", "integ"]
data['period'] =  pd.to_datetime(data['period'], infer_datetime_format=True)

Then I want to assign frequency to my 'period' column by doing this:

tdelta = data.period[1] - data.period[0]
data.period.freq = tdelta

And some print comands:

print(data)
print(data.period.freq)
print(data.dtypes)

Returns:

..........
270 1948-07-01   2033.2
271 1948-04-01   2021.9
272 1948-01-01   1989.5
273 1947-10-01   1960.7
274 1947-07-01   1930.3
275 1947-04-01   1932.3
276 1947-01-01   1934.5

[277 rows x 2 columns]
-92 days +00:00:00
period    datetime64[ns]
integ            float64
dtype: object

I can also parse the original 'DATE' column by making it 'index':

data = pd.read_csv("https://www.quandl.com/api/v3/datasets/FRED/GDPC1.csv?api_key=", parse_dates=True, index_col='DATE')

What I want to do is just to covert the quarterly data in to monthly rows. For example:

270 1948-07-01   2033.2
271 1948-06-01   NaN
272 1948-05-01   NaN
273 1948-04-01   2021.9
274 1948-03-01   NaN
275 1948-02-01   NaN
276 1948-01-01   1989.5
......and so on.......

I'm eventually trying to do this by using ts.asfreq(MonthBegin()) and , ts.asfreq(MonthBegin(), method='pad'). So far unsuccessfully. I have the following error:

NameError: name 'MonthBegin' is not defined

My question is can I use asfreq if I don't use date_range to create the frame? Somehow to 'pass' my date column to the function. If this is not the solution is it there any other easy way to convert quarterly to monthly frequency?

Upvotes: 1

Views: 1017

Answers (1)

Alicia Garcia-Raboso
Alicia Garcia-Raboso

Reputation: 13923

Use a TimeGrouper:

import pandas as pd

periods = ['1948-07-01', '1948-04-01', '1948-01-01', '1947-10-01',
           '1947-07-01', '1947-04-01', '1947-01-01']
integs = [2033.2, 2021.9, 1989.5, 1960.7, 1930.3, 1932.3, 1934.5]
df = pd.DataFrame({'period': pd.to_datetime(periods), 'integ': integs})
df = df.set_index('period')

df = df.groupby(pd.TimeGrouper('MS')).sum().sort_index(ascending=False)

EDIT: You can also use resample instead of a TimeGrouper:

df.resample('MS').sum().sort_index(ascending=False)

Upvotes: 2

Related Questions