Reputation: 8314
I have a DataFrame that looks like this:
user_id category frequency
0 user1 cat1 4
1 user2 cat2 1
2 user2 cat3 4
3 user3 cat3 1
4 user3 cat4 3
For each user I have associated categories with their frequencies. In total, there are 4 categories (cat1, cat2, cat3, cat4), and I would like to expand the data of each user by adding the missing categories with frequency equal to zero.
So the expected outcome is:
user_id category frequency
0 user1 cat1 4
1 user1 cat2 0
2 user1 cat3 0
3 user1 cat4 0
4 user2 cat1 0
5 user2 cat2 1
6 user2 cat3 4
7 user2 cat4 0
8 user3 cat1 0
9 user3 cat2 0
10 user3 cat3 1
11 user3 cat4 3
So now each user has all the 4 associated categories. Is there any strait forward solution to achieve that?
Upvotes: 1
Views: 462
Reputation: 109756
You can create a pivot table on user_id
and category
, fill nan
values with zero, stack category
(which makes the dataframe indexed on user_id
and category
), and then reset the index to match the desired output.
>>> (df.pivot(index='user_id', columns='category', values='frequency')
.fillna(0)
.stack()
.reset_index()
user_id category 0
0 user1 cat1 4
1 user1 cat2 0
2 user1 cat3 0
3 user1 cat4 0
4 user2 cat1 0
5 user2 cat2 1
6 user2 cat3 4
7 user2 cat4 0
8 user3 cat1 0
9 user3 cat2 0
10 user3 cat3 1
11 user3 cat4 3
Upvotes: 1