Reputation: 5528
I've got a situation where I've got two columns, where one is the category id and another is the category value. Since the values have different meanings for different categories, I want to extract the values to separate columns. So for example, I'd like to convert
+----+--------+
| id | value |
+----+--------+
| 1 | 1 |
| 1 | 2 |
| 2 | 1 |
| 2 | 2 |
| 3 | 1 |
| 3 | 2 |
+----+--------+
to this:
+----+---------+---------+---------+
| id | value_1 | value_2 | value_3 |
+----+---------+---------+---------+
| 1 | 1 | 0 | 0 |
| 1 | 2 | 0 | 0 |
| 2 | 0 | 1 | 0 |
| 2 | 0 | 2 | 0 |
| 3 | 0 | 0 | 1 |
| 3 | 0 | 0 | 2 |
+----+---------+---------+---------+
I'm sure I could do this with some numpy style manipulation, but I am trying to get a better grasp on pandas, and would like to do it in a pandas-y way.
My naive guess would be to use something with groupby
, but I am not sure how I would generate multiple columns from that. Something along the lines of
data.groupby('id').agg({'value': something})
Any help would be appreciated.
Upvotes: 2
Views: 39
Reputation: 210822
you can use get_dummies() and multiply it by value
column:
In [58]: df[['id']] \
.join(pd.get_dummies(df['id']).mul(df['value'], axis=0).add_prefix('value_'))
Out[58]:
id value_1 value_2 value_3
0 1 1 0 0
1 1 2 0 0
2 2 0 1 0
3 2 0 2 0
4 3 0 0 1
5 3 0 0 2
Explanation:
In [53]: pd.get_dummies(df['id'])
Out[53]:
1 2 3
0 1 0 0
1 1 0 0
2 0 1 0
3 0 1 0
4 0 0 1
5 0 0 1
In [52]: pd.get_dummies(df['id']).mul(df['value'], axis=0)
Out[52]:
1 2 3
0 1 0 0
1 2 0 0
2 0 1 0
3 0 2 0
4 0 0 1
5 0 0 2
Upvotes: 1