CG7
CG7

Reputation: 111

Add binary classification columns according for month and season from existing date column

I have a dataframe with dates as such:

print(data)

          date      time  
0   2017-01-10  00:00:00        
1   2017-01-17  00:00:00        
2   2017-01-24  00:00:00        
3   2017-01-31  00:00:00        
4   2017-02-07  00:00:00        
..         ...       ...   
220 2021-04-27  00:00:00   
221 2021-05-03  00:00:00   
222 2021-05-10  00:00:00   
223 2021-05-17  00:00:00   
224 2021-05-25  00:00:00   

How can I add season columns winter,spring,summer,fall and month columns january -> december with binary encoding so that my dataframe looks like so:

print(data)

          date      time  winter  spring  summer  fall  january  february  etc.
0   2017-01-10  00:00:00       1       0       0     0        1         0   ...
1   2017-01-17  00:00:00       1       0       0     0        1         0   ...
2   2017-01-24  00:00:00       1       0       0     0        1         0   ...
3   2017-01-31  00:00:00       1       0       0     0        1         0   ...
4   2017-02-07  00:00:00       1       0       0     0        0         1   ...
..         ...       ...     ...     ...     ...   ...      ...       ...   ...
220 2021-04-27  00:00:00       0       1       0     0        0         0   ...
221 2021-05-03  00:00:00       0       1       0     0        0         0   ...
222 2021-05-10  00:00:00       0       1       0     0        0         0   ...
223 2021-05-17  00:00:00       0       1       0     0        0         0   ...
224 2021-05-25  00:00:00       0       1       0     0        0         0   ...

Upvotes: 0

Views: 252

Answers (1)

Henry Ecker
Henry Ecker

Reputation: 35646

Try:

  1. Converting dates to Month Names via Categorical.from_codes and calendar

  2. Converting months to seasons via some math manipulation of month numbers and convert to Categorical.from_codes

  3. Then calling pd.get_dummies on the new columns.

import calendar

import pandas as pd

df = pd.DataFrame({
    'date': pd.date_range('2017-01-10', '2021-05-25', freq='MS')
})

df['month'] = pd.Categorical.from_codes(
    df['date'].dt.month - 1,
    categories=list(calendar.month_name),
    ordered=True
)

df['season'] = pd.Categorical.from_codes(
    df['date'].dt.month % 12 // 3,
    categories=['winter', 'spring', 'summer', 'fall'],
    ordered=True
)

df = pd.get_dummies(df, columns=['season', 'month'], prefix_sep='', prefix='')

Sample Output with Categorical:

        date  winter  spring  summer  ...  September  October  November  December
0 2017-02-01       1       0       0  ...          0        0         0         0
1 2017-03-01       0       1       0  ...          0        0         0         0
2 2017-04-01       0       1       0  ...          0        0         0         0
3 2017-05-01       0       1       0  ...          0        0         0         0
4 2017-06-01       0       0       1  ...          0        0         0         0
5 2017-07-01       0       0       1  ...          0        0         0         0
6 2017-08-01       0       0       1  ...          0        0         0         0
7 2017-09-01       0       0       0  ...          0        0         0         0
8 2017-10-01       0       0       0  ...          1        0         0         0
9 2017-11-01       0       0       0  ...          0        1         0         0

The benefit of categorical is that dummies will appear in the correct order rather than if they were strings like with:

import pandas as pd

df = pd.DataFrame({
    'date': pd.date_range('2017-01-10', '2021-05-25', freq='MS')
})

df['month'] = df['date'].dt.strftime('%B')

df['season'] = (
        df['date'].dt.month % 12 // 3
).replace({0: 'winter', 1: 'spring', 2: 'summer', 3: 'fall'})

df = pd.get_dummies(df, columns=['season', 'month'], prefix_sep='', prefix='')

Sample Output with replace and dt.strftime (Notice seasons and months are in alphabetical order):

        date  fall  spring  summer  ...  May  November  October  September
0 2017-02-01     0       0       0  ...    0         0        0          0
1 2017-03-01     0       1       0  ...    0         0        0          0
2 2017-04-01     0       1       0  ...    0         0        0          0
3 2017-05-01     0       1       0  ...    1         0        0          0
4 2017-06-01     0       0       1  ...    0         0        0          0
5 2017-07-01     0       0       1  ...    0         0        0          0
6 2017-08-01     0       0       1  ...    0         0        0          0
7 2017-09-01     1       0       0  ...    0         0        0          1
8 2017-10-01     1       0       0  ...    0         0        1          0
9 2017-11-01     1       0       0  ...    0         1        0          0

Upvotes: 3

Related Questions