Reputation:
I need to add some more columns to the dataset. Since there are 7 days in a week, for this reason, I added 7 additional columns in my dataset as "day_1", "day_2", ..."day_7" then for all timestamp data now I want to extract the day information. For example for a line that corresponds to Tuesday, only "day_2" column of should be "1" and other columns (day_1, day_3, day_4, day_5, day_6, day_7) should be all "0". for this task I want to use a procedure which is known as "one-hot encoding". How may I write code for this requirement?
#dataset['day_7'] = dataset.insert(0, 'day_7', 0)
#dataset['day_6'] = dataset.insert(0, 'day_6', 0)
#dataset['day_5'] = dataset.insert(0, 'day_5', 0)
#dataset['day_4'] = dataset.insert(0, 'day_4', 0)
#dataset['day_3'] = dataset.insert(0, 'day_3', 0)
#dataset['day_2'] = dataset.insert(0, 'day_2', 0)
#dataset['day_1'] = dataset.insert(0, 'day_1', 0)
Here is my Date column from dataset:
#0 2016-01-01 05:00:00
#1 2016-01-01 06:00:00
#2 2016-01-01 07:00:00
#3 2016-01-01 08:00:00
#4 2016-01-01 09:00:00
#5 2016-01-01 10:00:00
#6 2016-01-01 11:00:00
#7 2016-01-01 12:00:00
#8 2016-01-01 13:00:00
#9 2016-01-01 14:00:00
#10 2016-01-01 15:00:00
#11 2016-01-01 16:00:00
#12 2016-01-01 17:00:00
#13 2016-01-01 18:00:00
#14 2016-01-01 19:00:00
#15 2016-01-01 20:00:00
#16 2016-01-01 21:00:00
#17 2016-01-01 22:00:00
#18 2016-01-01 23:00:00
#19 2016-01-02 00:00:00
#20 2016-01-02 01:00:00
#21 2016-01-02 02:00:00
#22 2016-01-02 03:00:00
#23 2016-01-02 04:00:00
#24 2016-01-02 05:00:00
#25 2016-01-02 06:00:00
#26 2016-01-02 07:00:00
#27 2016-01-02 08:00:00
#28 2016-01-02 09:00:00
#29 2016-01-02 10:00:00
#30 2016-01-02 11:00:00
#31 2016-01-02 12:00:00
#32 2016-01-02 13:00:00
#33 2016-01-02 14:00:00
#34 2016-01-02 15:00:00
#35 2016-01-02 16:00:00
#36 2016-01-02 17:00:00
#37 2016-01-02 18:00:00
#38 2016-01-02 19:00:00
#39 2016-01-02 20:00:00
#40 2016-01-02 21:00:00
#41 2016-01-02 22:00:00
#42 2016-01-02 23:00:00
#43 2016-01-03 00:00:00
#44 2016-01-03 01:00:00
#45 2016-01-03 02:00:00
#46 2016-01-03 03:00:00
#47 2016-01-03 04:00:00
#48 2016-01-03 05:00:00
#49 2016-01-03 06:00:00
#50 2016-01-03 07:00:00
#51 2016-01-03 08:00:00
#52 2016-01-03 09:00:00
#53 2016-01-03 10:00:00
#54 2016-01-03 11:00:00
#55 2016-01-03 12:00:00
#56 2016-01-03 13:00:00
#57 2016-01-03 14:00:00
#58 2016-01-03 15:00:00
#59 2016-01-03 16:00:00
#60 2016-01-03 17:00:00
#61 2016-01-03 18:00:00
#62 2016-01-03 19:00:00
#63 2016-01-03 20:00:00
#64 2016-01-03 21:00:00
#65 2016-01-03 22:00:00
#66 2016-01-03 23:00:00
Upvotes: 1
Views: 414
Reputation: 239
### Import Libraries
from datetime import datetime
### Get some Data:
dateList = pd.date_range('2012-04-27 05:00:00 ', periods=24)
df = pd.DataFrame(data = dateList, columns = ['Date'])
### Get Day name
df['Day'] = df['Date'].dt.day_name()
### Generate Day columns
Days = pd.get_dummies(df.Day)
### Stitch the two the original and the Days DataFrames together
df = pd.concat([df.drop(['Day'], axis = 1), Days], axis = 1)
df
Date Friday Monday Saturday Sunday Thursday Tuesday Wednesday
0 2012-04-27 05:00:00 1 0 0 0 0 0 0
1 2012-04-28 05:00:00 0 0 1 0 0 0 0
2 2012-04-29 05:00:00 0 0 0 1 0 0 0
3 2012-04-30 05:00:00 0 1 0 0 0 0 0
4 2012-05-01 05:00:00 0 0 0 0 0 1 0
Upvotes: 0
Reputation: 201
You can use get_dummies :
import pandas as pd
df = pd.DataFrame({'timestamp': [
'2016-01-01 05:00:00', '2016-01-01 06:00:00', '2016-01-01 07:00:00', '2016-01-01 08:00:00',
'2016-01-02 00:00:00', '2016-01-02 06:00:00', '2016-01-02 07:00:00', '2016-01-02 08:00:00',
'2016-01-03 00:00:00', '2016-01-03 06:00:00', '2016-01-03 07:00:00', '2016-01-03 08:00:00',
'2016-01-04 00:00:00', '2016-01-04 06:00:00', '2016-01-04 07:00:00', '2016-01-04 08:00:00',
]})
# convert to datetime
df.timestamp = pd.to_datetime(df.timestamp)
# extract the day and add 1
df['day'] = df.timestamp.dt.dayofweek + 1 # Thanks @Mark Wang
# create one-hot encoding
df_onehot = pd.get_dummies(df.day, prefix='day')
# merge back
df = pd.concat([df,df_onehot], axis=1)
Upvotes: 1