Reputation: 179
I want to convert a DataFrame column containing string values such as 2020Q2
to period type. I tried the following solution: https://stackoverflow.com/a/40447216/13010940 but got the following error: TypeError: Incorrect dtype
.
import pandas as pd
x=pd.DataFrame({'col':['2019Q1', '2019Q2', '2019Q3', '2019Q4', '2020Q1', '2020Q2']})
x['period']=pd.PeriodIndex(x.col, freq='Q-Oct')
I tried PeriodIndex
for a single string, too.
pd.PeriodIndex('2020Q2', freq='Q-Oct')
This also gives an error: ValueError: Given date string not likely a datetime.
Of course, I can convert string to datetime first and then covert it to period.
x['period']=pd.to_datetime(x.col).dt.to_period(freq='Q-oct')
and
pd.to_datetime('2020Q2').to_period(freq='Q-oct')
But I think there is a nicer solution.
Upvotes: 2
Views: 993
Reputation: 402563
Your method is correct, this is a regression bug introduced after 0.23 I believe (?) which causes it to not accept periods in a Series. Try converting it as a list or array:
pd.__version__
# '1.0.4'
pd.PeriodIndex(x['col'], freq='Q-Oct')
# TypeError: Incorrect dtype
# pd.PeriodIndex(x['col'].to_numpy(), freq='Q-Oct') # also works
pd.PeriodIndex(x['col'].tolist(), freq='Q-Oct')
# PeriodIndex(['2019Q1', '2019Q2', '2019Q3', '2019Q4', '2020Q1', '2020Q2'],
# dtype='period[Q-OCT]', freq='Q-OCT')
This works on 1.1:
pd.__version__
# '1.1.0.dev0+2004.g8d10bfb6f'
pd.PeriodIndex(x['col'], freq='Q-Oct')
# PeriodIndex(['2019Q1', '2019Q2', '2019Q3', '2019Q4', '2020Q1', '2020Q2'],
# dtype='period[Q-OCT]', freq='Q-OCT')
When the time is right, just upgrade!
Upvotes: 4