Ani
Ani

Reputation: 179

passing Series to pd.PeriodIndex in Pandas results in TypeError: Incorrect dtype

I want to convert a DataFrame column containing string values such as 2020Q2 to period type. I tried the following solution: https://stackoverflow.com/a/40447216/13010940 but got the following error: TypeError: Incorrect dtype.

import pandas as pd
x=pd.DataFrame({'col':['2019Q1', '2019Q2', '2019Q3', '2019Q4', '2020Q1', '2020Q2']})
x['period']=pd.PeriodIndex(x.col, freq='Q-Oct')

I tried PeriodIndex for a single string, too.

pd.PeriodIndex('2020Q2', freq='Q-Oct')

This also gives an error: ValueError: Given date string not likely a datetime.

Of course, I can convert string to datetime first and then covert it to period.

x['period']=pd.to_datetime(x.col).dt.to_period(freq='Q-oct')

and

pd.to_datetime('2020Q2').to_period(freq='Q-oct')

But I think there is a nicer solution.

Upvotes: 2

Views: 993

Answers (1)

cs95
cs95

Reputation: 402563

This is a regression bug that has been fixed for version 1.1. Please see GH26109.

Your method is correct, this is a regression bug introduced after 0.23 I believe (?) which causes it to not accept periods in a Series. Try converting it as a list or array:

pd.__version__
# '1.0.4'

pd.PeriodIndex(x['col'], freq='Q-Oct')
# TypeError: Incorrect dtype

# pd.PeriodIndex(x['col'].to_numpy(), freq='Q-Oct')  # also works
pd.PeriodIndex(x['col'].tolist(), freq='Q-Oct')
# PeriodIndex(['2019Q1', '2019Q2', '2019Q3', '2019Q4', '2020Q1', '2020Q2'], 
#             dtype='period[Q-OCT]', freq='Q-OCT')

This works on 1.1:

pd.__version__
# '1.1.0.dev0+2004.g8d10bfb6f'

pd.PeriodIndex(x['col'], freq='Q-Oct')
# PeriodIndex(['2019Q1', '2019Q2', '2019Q3', '2019Q4', '2020Q1', '2020Q2'], 
#             dtype='period[Q-OCT]', freq='Q-OCT')

When the time is right, just upgrade!

Upvotes: 4

Related Questions