Caiotru
Caiotru

Reputation: 325

Pandas systematically identify missing multi-index categorical values

I have the following dataframe, in the ID column we can have 2 feedbacks, good or bad. I cannot figure out how to systematically identify if a user is missing a feedback and if it is missing, add a new line with the missing feedback in the level 1 and add 0 to all values.

import pandas as pd

df = {'ID': ['Good','Good','Good', 'Bad', 'Bad', 'Bad', 'Good', 'Good', 'Bad'],
      'USERS' : ['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'C'],
      'DATE_VIEW': ['16/05/2019','16/05/2019', '16/05/2019', '18/03/2020', '18/03/2020', '18/03/2020', '18/03/2020', '18/03/2020', '18/03/2020'],
      'VALUES': [1, 3, 4, 5, 6, 7, 8, 1, 2]
       }

df = pd.DataFrame(df)
df = pd.pivot_table(df, index=['USERS', 'ID'], columns='DATE_VIEW', values='VALUES', aggfunc='sum', fill_value=0)

enter image description here

this is the expected output:

enter image description here

Upvotes: 2

Views: 210

Answers (2)

Shubham Sharma
Shubham Sharma

Reputation: 71689

unstack and stack

df.unstack(fill_value=0).stack()

DATE_VIEW   16/05/2019  18/03/2020
USERS ID                          
A     Bad            0           6
      Good           5           8
B     Bad            0          12
      Good           3           1
C     Bad            0           2
      Good           0           0

Upvotes: 1

Andrej Kesely
Andrej Kesely

Reputation: 195438

Try:

idx = pd.MultiIndex.from_product(
    [
        df.index.get_level_values(0).unique(),
        df.index.get_level_values(1).unique(),
    ],
    names=["USERS", "ID"],
)

df = df.reindex(idx, fill_value=0)
print(df)

Prints:

DATE_VIEW   16/05/2019  18/03/2020
USERS ID                          
A     Bad            0           6
      Good           5           8
B     Bad            0          12
      Good           3           1
C     Bad            0           2
      Good           0           0

Upvotes: 3

Related Questions