Reputation: 2455
Context
Lets say I have a pandas-DataFrame like this:
>>> data.head()
values atTime
date
2006-07-01 00:00:00+02:00 15.10 0000
2006-07-01 00:15:00+02:00 16.10 0015
2006-07-01 00:30:00+02:00 17.75 0030
2006-07-01 00:45:00+02:00 17.35 0045
2006-07-01 01:00:00+02:00 17.25 0100
atTime represents the hour and minute of the timestamp used as index. I want to transpose the atTime-column to a binary matrix (making it sparse is also an option), which will be used as nominal feature in a machine learning approach.
The desired result should look like:
>>> data.head()
values 0000 0015 0030 0045 0000
date
2006-07-01 00:00:00+02:00 15.10 1 0 0 0 0
2006-07-01 00:15:00+02:00 16.10 0 1 0 0 0
2006-07-01 00:30:00+02:00 17.75 0 0 1 0 0
2006-07-01 00:45:00+02:00 17.35 0 0 0 1 0
2006-07-01 01:00:00+02:00 17.25 0 0 0 0 1
As might be anticipated, this matrix will be much larger when concidering all values in atTime.
My question
I can achieve the desired result with workarounds using apply
and using the timestamps in order to create the new columns beforehand.
However, is there a build-in option in pandas (or via numpy, concidering atTime as numpy-array) to achieve the same without a workaround?
Upvotes: 2
Views: 416
Reputation: 402872
This is a use case for get_dummies
:
pd.get_dummies(df, columns=["atTime"])
values atTime_0 atTime_15 atTime_30 atTime_45 atTime_100
date
2006-07-01 00:00:00+02:00 15.10 1 0 0 0 0
2006-07-01 00:15:00+02:00 16.10 0 1 0 0 0
2006-07-01 00:30:00+02:00 17.75 0 0 1 0 0
2006-07-01 00:45:00+02:00 17.35 0 0 0 1 0
2006-07-01 01:00:00+02:00 17.25 0 0 0 0 1
Solution updated with OP's recommendation. Thanks!
Upvotes: 8