Reputation: 111
I have a dataframe with dates as such:
print(data)
date time
0 2017-01-10 00:00:00
1 2017-01-17 00:00:00
2 2017-01-24 00:00:00
3 2017-01-31 00:00:00
4 2017-02-07 00:00:00
.. ... ...
220 2021-04-27 00:00:00
221 2021-05-03 00:00:00
222 2021-05-10 00:00:00
223 2021-05-17 00:00:00
224 2021-05-25 00:00:00
How can I add season columns winter
,spring
,summer
,fall
and month columns january
-> december
with binary encoding so that my dataframe looks like so:
print(data)
date time winter spring summer fall january february etc.
0 2017-01-10 00:00:00 1 0 0 0 1 0 ...
1 2017-01-17 00:00:00 1 0 0 0 1 0 ...
2 2017-01-24 00:00:00 1 0 0 0 1 0 ...
3 2017-01-31 00:00:00 1 0 0 0 1 0 ...
4 2017-02-07 00:00:00 1 0 0 0 0 1 ...
.. ... ... ... ... ... ... ... ... ...
220 2021-04-27 00:00:00 0 1 0 0 0 0 ...
221 2021-05-03 00:00:00 0 1 0 0 0 0 ...
222 2021-05-10 00:00:00 0 1 0 0 0 0 ...
223 2021-05-17 00:00:00 0 1 0 0 0 0 ...
224 2021-05-25 00:00:00 0 1 0 0 0 0 ...
Upvotes: 0
Views: 252
Reputation: 35646
Try:
Converting dates to Month Names via Categorical.from_codes
and calendar
Converting months to seasons via some math manipulation of month numbers and convert to Categorical.from_codes
Then calling pd.get_dummies
on the new columns.
import calendar
import pandas as pd
df = pd.DataFrame({
'date': pd.date_range('2017-01-10', '2021-05-25', freq='MS')
})
df['month'] = pd.Categorical.from_codes(
df['date'].dt.month - 1,
categories=list(calendar.month_name),
ordered=True
)
df['season'] = pd.Categorical.from_codes(
df['date'].dt.month % 12 // 3,
categories=['winter', 'spring', 'summer', 'fall'],
ordered=True
)
df = pd.get_dummies(df, columns=['season', 'month'], prefix_sep='', prefix='')
Sample Output with Categorical:
date winter spring summer ... September October November December
0 2017-02-01 1 0 0 ... 0 0 0 0
1 2017-03-01 0 1 0 ... 0 0 0 0
2 2017-04-01 0 1 0 ... 0 0 0 0
3 2017-05-01 0 1 0 ... 0 0 0 0
4 2017-06-01 0 0 1 ... 0 0 0 0
5 2017-07-01 0 0 1 ... 0 0 0 0
6 2017-08-01 0 0 1 ... 0 0 0 0
7 2017-09-01 0 0 0 ... 0 0 0 0
8 2017-10-01 0 0 0 ... 1 0 0 0
9 2017-11-01 0 0 0 ... 0 1 0 0
The benefit of categorical is that dummies will appear in the correct order rather than if they were strings like with:
import pandas as pd
df = pd.DataFrame({
'date': pd.date_range('2017-01-10', '2021-05-25', freq='MS')
})
df['month'] = df['date'].dt.strftime('%B')
df['season'] = (
df['date'].dt.month % 12 // 3
).replace({0: 'winter', 1: 'spring', 2: 'summer', 3: 'fall'})
df = pd.get_dummies(df, columns=['season', 'month'], prefix_sep='', prefix='')
Sample Output with replace
and dt.strftime
(Notice seasons and months are in alphabetical order):
date fall spring summer ... May November October September
0 2017-02-01 0 0 0 ... 0 0 0 0
1 2017-03-01 0 1 0 ... 0 0 0 0
2 2017-04-01 0 1 0 ... 0 0 0 0
3 2017-05-01 0 1 0 ... 1 0 0 0
4 2017-06-01 0 0 1 ... 0 0 0 0
5 2017-07-01 0 0 1 ... 0 0 0 0
6 2017-08-01 0 0 1 ... 0 0 0 0
7 2017-09-01 1 0 0 ... 0 0 0 1
8 2017-10-01 1 0 0 ... 0 0 1 0
9 2017-11-01 1 0 0 ... 0 1 0 0
Upvotes: 3