Reputation: 55
Here is a sample with date format:
data = pd.DataFrame({'Quarter':['Q1_01','Q2_01', 'Q3_01', 'Q4_01', 'Q1_02','Q2_02']
, 'Sale' :[10, 20, 30, 40, 50, 60]})
print(data)
# Quarter Sale
#0 Q1_01 10
#1 Q2_01 20
#2 Q3_01 30
#3 Q4_01 40
#4 Q1_02 50
#5 Q2_02 60
print(data.dtypes)
# Quarter object
# Sale int64
Would like to convert Quarter column into Pandas datetime format like 'Jan-2001' or '01-2001' that can be used in fbProphet for time series analysis.
Tried using strptime but got an error TypeError: strptime() argument 1 must be str, not Series
from datetime import datetime
data['Quarter'] = datetime.strptime(data['Quarter'], 'Q%q_%y')
What is the cause of the error ? Any better solution?
Upvotes: 2
Views: 2545
Reputation: 402553
Knowing the format to_datetime
needs to pass period indices is helpful (it is along the lines of YYYY-QX
), so we start with replace
, then to_datetime
and finally strftime
:
u = df.Quarter.str.replace(r'(Q\d)_(\d+)', r'20\2-\1')
pd.to_datetime(u).dt.strftime('%b-%Y')
0 Jan-2001
1 Apr-2001
2 Jul-2001
3 Oct-2001
4 Jan-2002
5 Apr-2002
Name: Quarter, dtype: object
The month represents the start of its respective quarter.
If the dates can range across the 90s and the 2000s, then let's try something different:
df = pd.DataFrame({'Quarter':['Q1_98','Q2_99', 'Q3_01', 'Q4_01', 'Q1_02','Q2_02']})
dt = pd.to_datetime(df.Quarter.str.replace(r'(Q\d)_(\d+)', r'\2-\1'))
(dt.where(dt <= pd.to_datetime('today'), dt - pd.DateOffset(years=100))
.dt.strftime('%b-%Y'))
0 Jan-1998
1 Apr-1999
2 Jul-2001
3 Oct-2001
4 Jan-2002
5 Apr-2002
Name: Quarter, dtype: object
pd.to_datetime
auto-parses "98" as "2098", so we do a little fix to subtract 100 years from dates later than "today's date".
This hack will stop working in a few decades. Ye pandas gods, have mercy on my soul :-)
Another option is parsing to PeriodIndex
:
(pd.PeriodIndex(df.Quarter.str.replace(r'(Q\d)_(\d+)', r'20\2-\1'), freq='Q')
.strftime('%b-%Y'))
# Index(['Mar-2001', 'Jun-2001', 'Sep-2001',
# 'Dec-2001', 'Mar-2002', 'Jun-2002'], dtype='object')
Here, the months printed out are at the ends of their respective quarters. You decide what to use.
Upvotes: 3