PybabyR
PybabyR

Reputation: 55

Convert Q1-Q4 period strings to datetime using pandas

Here is a sample with date format:

data = pd.DataFrame({'Quarter':['Q1_01','Q2_01', 'Q3_01', 'Q4_01', 'Q1_02','Q2_02']
                     , 'Sale' :[10, 20, 30, 40, 50, 60]})

print(data)

#   Quarter  Sale
#0  Q1_01    10
#1  Q2_01    20
#2  Q3_01    30
#3  Q4_01    40
#4  Q1_02    50
#5  Q2_02    60

print(data.dtypes)
# Quarter    object
# Sale        int64

Would like to convert Quarter column into Pandas datetime format like 'Jan-2001' or '01-2001' that can be used in fbProphet for time series analysis.

Tried using strptime but got an error TypeError: strptime() argument 1 must be str, not Series

from datetime import datetime

data['Quarter'] = datetime.strptime(data['Quarter'], 'Q%q_%y')

What is the cause of the error ? Any better solution?

Upvotes: 2

Views: 2545

Answers (1)

cs95
cs95

Reputation: 402553

Knowing the format to_datetime needs to pass period indices is helpful (it is along the lines of YYYY-QX), so we start with replace, then to_datetime and finally strftime:

u = df.Quarter.str.replace(r'(Q\d)_(\d+)', r'20\2-\1')
pd.to_datetime(u).dt.strftime('%b-%Y')

0    Jan-2001
1    Apr-2001
2    Jul-2001
3    Oct-2001
4    Jan-2002
5    Apr-2002
Name: Quarter, dtype: object

The month represents the start of its respective quarter.


If the dates can range across the 90s and the 2000s, then let's try something different:

df = pd.DataFrame({'Quarter':['Q1_98','Q2_99', 'Q3_01', 'Q4_01', 'Q1_02','Q2_02']}) 

dt = pd.to_datetime(df.Quarter.str.replace(r'(Q\d)_(\d+)', r'\2-\1'))
(dt.where(dt <= pd.to_datetime('today'), dt - pd.DateOffset(years=100))
   .dt.strftime('%b-%Y'))

0    Jan-1998
1    Apr-1999
2    Jul-2001
3    Oct-2001
4    Jan-2002
5    Apr-2002
Name: Quarter, dtype: object

pd.to_datetime auto-parses "98" as "2098", so we do a little fix to subtract 100 years from dates later than "today's date".

This hack will stop working in a few decades. Ye pandas gods, have mercy on my soul :-)


Another option is parsing to PeriodIndex:

(pd.PeriodIndex(df.Quarter.str.replace(r'(Q\d)_(\d+)', r'20\2-\1'), freq='Q')
   .strftime('%b-%Y'))
# Index(['Mar-2001', 'Jun-2001', 'Sep-2001', 
#        'Dec-2001', 'Mar-2002', 'Jun-2002'], dtype='object')

Here, the months printed out are at the ends of their respective quarters. You decide what to use.

Upvotes: 3

Related Questions