OptimusPrime
OptimusPrime

Reputation: 619

Fill in missing data pandas

How can I fill in the missing data in this dateframe.

Missing values for days when no sales are made. How can I fill in the missing values for days where 0 of an item were sold at a particular store and date?

Input

Dates            Store            Item        Sales           
2017-01-01       Chicago          Apple       10
2017-01-02       NewYork          Pear        10 
2017-01-03       Chicago          Apple       10

Output

Dates            Store            Item        Sales           
2017-01-01       Chicago          Apple       10
2017-01-01       Chicago          Pear        0
2017-01-02       Chicago          Apple       0
2017-01-02       Chicago          Pear        0
2017-01-03       Chicago          Apple       10
2017-01-03       Chicago          Pear        0    
2017-01-01       NewYork          Apple       0
2017-01-01       NewYork          Pear        0 
2017-01-02       NewYork          Apple       0 
2017-01-02       NewYork          Pear        10 
2017-01-03       NewYork          Apple       0 
2017-01-03       NewYork          Pear        0 

Upvotes: 3

Views: 212

Answers (2)

BENY
BENY

Reputation: 323226

It will be a little bit hard to understand, by using set_index,stack and unstack

df.set_index(['Dates','Store','Item']).unstack().stack(dropna=False).\
    unstack(1).stack(dropna=False).fillna(0).reset_index()
Out[258]: 
         Dates   Item    Store  Sales
0   2017-01-01  Apple  Chicago   10.0
1   2017-01-01  Apple  NewYork    0.0
2   2017-01-01   Pear  Chicago    0.0
3   2017-01-01   Pear  NewYork    0.0
4   2017-01-02  Apple  Chicago    0.0
5   2017-01-02  Apple  NewYork    0.0
6   2017-01-02   Pear  Chicago    0.0
7   2017-01-02   Pear  NewYork   10.0
8   2017-01-03  Apple  Chicago   10.0
9   2017-01-03  Apple  NewYork    0.0
10  2017-01-03   Pear  Chicago    0.0
11  2017-01-03   Pear  NewYork    0.0

Upvotes: 1

jezrael
jezrael

Reputation: 862641

Use:


df = df.set_index(['Dates','Store','Item'])
mux = pd.MultiIndex.from_product(df.index.levels, names=df.index.names)
df = df.reindex(mux, fill_value=0).sort_index(level='Store').reset_index()
print (df)
        Dates    Store   Item  Sales
0  2017-01-01  Chicago  Apple     10
1  2017-01-01  Chicago   Pear      0
2  2017-01-02  Chicago  Apple      0
3  2017-01-02  Chicago   Pear      0
4  2017-01-03  Chicago  Apple     10
5  2017-01-03  Chicago   Pear      0
6  2017-01-01  NewYork  Apple      0
7  2017-01-01  NewYork   Pear      0
8  2017-01-02  NewYork  Apple      0
9  2017-01-02  NewYork   Pear     10
10 2017-01-03  NewYork  Apple      0
11 2017-01-03  NewYork   Pear      0

Upvotes: 5

Related Questions