Roger Hung
Roger Hung

Reputation: 15

Python. extract Date column into few columns of 'day of week', month... in pandas

I am trying to convert a column 'Date' into few columns of 'day of week'...etc. I am not sure why it always gets stuck after about 2000 steps. Because there are quite a lot of data, I would also love to know if there is a faster way of doing so. Thank you.

trainset.head()

Zone_ID Date Hour_slot Hire_count

0 1 2016-02-01 0 0

1 1 2016-02-01 1 0

2 1 2016-02-01 2 0

3 1 2016-02-01 3 0

4 1 2016-02-01 4 0

trainset.shape

(219600, 4)

This is what I have

TrainSet = trainset.copy()
TrainSet['w'] = 0
TrainSet['j'] = 0
TrainSet['U'] = 0
TrainSet['W'] = 0

for i in range(trainset.shape[0]):
    TrainSet.loc[i, 'w'] = datetime.datetime.strptime(trainset.loc[i,'Date'], "%Y-%m-%d").strftime('%w')
    TrainSet.loc[i, 'j'] = datetime.datetime.strptime(trainset.loc[i,'Date'], "%Y-%m-%d").strftime('%j')
    TrainSet.loc[i, 'U'] = datetime.datetime.strptime(trainset.loc[i,'Date'], "%Y-%m-%d").strftime('%U')
    TrainSet.loc[i, 'W'] = datetime.datetime.strptime(trainset.loc[i,'Date'], "%Y-%m-%d").strftime('%W')
    print(i)

Upvotes: 1

Views: 147

Answers (1)

jpp
jpp

Reputation: 164623

You should use Pandas / NumPy methods with a datetime series rather than a manual loop. Here's a functional solution using operator.itemgetter:

from operator import attrgetter

# example dataframe
df = pd.DataFrame({'date': ['2017-05-01 15:00:20', '2018-11-30 10:01:11']})
df['date'] = pd.to_datetime(df['date'])

# list attributes
dt_attrs = ['year', 'hour', 'month', 'day', 'dayofweek']

# extract attributes
attributes = df['date'].apply(attrgetter(*dt_attrs))

# add attributes to dataframe
df[dt_attrs] = pd.DataFrame(attributes.values.tolist())

Result:

                 date  year  hour  month  day  dayofweek
0 2017-05-01 15:00:20  2017    15      5    1          0
1 2018-11-30 10:01:11  2018    10     11   30          4

Upvotes: 2

Related Questions