Pandas: unify the values of a column for each value of another column

Question

I have a DataFrame that looks like this:

    user_id category frequency
0   user1   cat1    4
1   user2   cat2    1
2   user2   cat3    4
3   user3   cat3    1
4   user3   cat4    3

For each user I have associated categories with their frequencies. In total, there are 4 categories (cat1, cat2, cat3, cat4), and I would like to expand the data of each user by adding the missing categories with frequency equal to zero.

So the expected outcome is:

    user_id category frequency
0   user1   cat1    4
1   user1   cat2    0
2   user1   cat3    0
3   user1   cat4    0
4   user2   cat1    0
5   user2   cat2    1
6   user2   cat3    4
7   user2   cat4    0
8   user3   cat1    0
9   user3   cat2    0
10  user3   cat3    1
11  user3   cat4    3

So now each user has all the 4 associated categories. Is there any strait forward solution to achieve that?

Alexander · Accepted Answer

You can create a pivot table on user_id and category, fill nan values with zero, stack category (which makes the dataframe indexed on user_id and category), and then reset the index to match the desired output.

>>> (df.pivot(index='user_id', columns='category', values='frequency')
     .fillna(0)
     .stack()
     .reset_index()

   user_id category  0
0    user1     cat1  4
1    user1     cat2  0
2    user1     cat3  0
3    user1     cat4  0
4    user2     cat1  0
5    user2     cat2  1
6    user2     cat3  4
7    user2     cat4  0
8    user3     cat1  0
9    user3     cat2  0
10   user3     cat3  1
11   user3     cat4  3

Pandas: unify the values of a column for each value of another column

Answers (1)

Related Questions