Roy
Roy

Reputation: 867

Adding missing values in Pandas by categories

I'm new to Pandas and I have a data frame of this form:

                 date category  value
0 2017-11-30 13:58:57        A    901
1 2017-11-30 13:59:41        B    905
2 2017-11-30 13:59:41        C    925

First column is a date, second column is categorical with known three categories.

It was generated by:

import pandas as pd
df = pd.DataFrame.from_items( [('date', ['2017-11-30 13:58:57', '2017-11-30 13:59:41', '2017-11-30 13:59:41']),('category',['A','B', 'C']),("value", [901, 905, 925])])
df['date'] =  pd.to_datetime(df['date'])
df['category'] = df['category'].astype('category')

The problem is that for each date, not all categories are there. I wish to add the missing categories with missing values to get:

                  date category value
0  2017-11-30 13:58:57        A   901
1  2017-11-30 13:58:57        B   nan
2  2017-11-30 13:58:57        C   nan
3  2017-11-30 13:59:41        A   nan
4  2017-11-30 13:59:41        B   905
5  2017-11-30 13:59:41        C   925

Is there a built-in way to do so without iterating the rows?

Upvotes: 1

Views: 609

Answers (1)

jezrael
jezrael

Reputation: 862611

You can use reindex by MultiIndex.from_product:

df = df.set_index(['date','category'])
cats = pd.MultiIndex.from_product(df.index.levels, names=df.index.names)

df = df.reindex(cats).reset_index()
print (df)
                 date category  value
0 2017-11-30 13:58:57        A  901.0
1 2017-11-30 13:58:57        B    NaN
2 2017-11-30 13:58:57        C    NaN
3 2017-11-30 13:59:41        A    NaN
4 2017-11-30 13:59:41        B  905.0
5 2017-11-30 13:59:41        C  925.0

Or unstack + stack:

df = (df.set_index(['date','category'])['value']
        .unstack()
        .stack(dropna=False)
        .reset_index(name='value'))
print (df)
                 date category  value
0 2017-11-30 13:58:57        A  901.0
1 2017-11-30 13:58:57        B    NaN
2 2017-11-30 13:58:57        C    NaN
3 2017-11-30 13:59:41        A    NaN
4 2017-11-30 13:59:41        B  905.0
5 2017-11-30 13:59:41        C  925.0

Upvotes: 1

Related Questions