Reputation: 15
I have a dataframe that has 4 columns: UID, Date, Type, and Value as such:
UID Date Type Value
50 2020-12-01 3 15
50 2020-12-01 2 13
50 2020-12-01 1 50
135 2020-12-02 2 0
135 2020-12-02 1 12
50 2020-12-02 4 100
50 2020-12-02 2 25
50 2020-12-02 3 15
50 2020-12-02 1 40
For a given user on a given date, each type that appears is unique (i.e there will never be two entries for Type X for UID Y on Day Z). Type is an integer between 1 and 4, inclusive.
I would like to transform this into a dataframe that has a column for each type, the value in the corresponding column, and reduces all rows to having a unique UID/Date pair as such, with missing type/value pairs as nan or 0:
UID Date Type_1 Type_2 Type_3 Type_4
50 2020-12-01 50 13 15 nan
135 2020-12-02 12 0 nan nan
50 2020-12-02 40 25 15 100
I've been tinkering with pivot but can't quite get it, any assistance would be much appreciated!
Upvotes: 1
Views: 998
Reputation: 8219
It is simply
df.pivot(index = ['UID','Date'], values = 'Value', columns = 'Type').add_prefix('Type_')
output
Type Type_1 Type_2 Type_3 Type_4
UID Date
50 2020-12-01 50.0 13.0 15.0 NaN
2020-12-02 40.0 25.0 15.0 100.0
135 2020-12-02 12.0 0.0 NaN NaN
you can stick reset_index()
at the end of the expression if you do not like those columns in the index
Upvotes: 2