Markus
Markus

Reputation: 2455

Transpose column in a DataFrame into a binary matrix

Context

Lets say I have a pandas-DataFrame like this:

>>> data.head()
                            values  atTime
date        
2006-07-01 00:00:00+02:00   15.10   0000
2006-07-01 00:15:00+02:00   16.10   0015
2006-07-01 00:30:00+02:00   17.75   0030
2006-07-01 00:45:00+02:00   17.35   0045
2006-07-01 01:00:00+02:00   17.25   0100

atTime represents the hour and minute of the timestamp used as index. I want to transpose the atTime-column to a binary matrix (making it sparse is also an option), which will be used as nominal feature in a machine learning approach.

The desired result should look like:

>>> data.head()
                            values  0000  0015  0030  0045  0000
date        
2006-07-01 00:00:00+02:00   15.10   1     0     0     0     0
2006-07-01 00:15:00+02:00   16.10   0     1     0     0     0
2006-07-01 00:30:00+02:00   17.75   0     0     1     0     0
2006-07-01 00:45:00+02:00   17.35   0     0     0     1     0
2006-07-01 01:00:00+02:00   17.25   0     0     0     0     1

As might be anticipated, this matrix will be much larger when concidering all values in atTime.


My question

I can achieve the desired result with workarounds using apply and using the timestamps in order to create the new columns beforehand.

However, is there a build-in option in pandas (or via numpy, concidering atTime as numpy-array) to achieve the same without a workaround?

Upvotes: 2

Views: 416

Answers (1)

cs95
cs95

Reputation: 402872

This is a use case for get_dummies:

pd.get_dummies(df, columns=["atTime"]) 
                           values  atTime_0  atTime_15  atTime_30  atTime_45  atTime_100
date                                                                                    
2006-07-01 00:00:00+02:00   15.10         1          0          0          0           0
2006-07-01 00:15:00+02:00   16.10         0          1          0          0           0
2006-07-01 00:30:00+02:00   17.75         0          0          1          0           0
2006-07-01 00:45:00+02:00   17.35         0          0          0          1           0
2006-07-01 01:00:00+02:00   17.25         0          0          0          0           1

Solution updated with OP's recommendation. Thanks!

Upvotes: 8

Related Questions