Insert zero to missing data in pandas.DataFrame

Question

I have a following kind of pandas.DataFrame:

sales_with_missing = pd.DataFrame({'month':[1,2,3,6,7,8,9,10,11,12],'code':[111]*10,  'sales':[np.random.randint(1500) for _ in np.arange(10)]})

You can see records for April and May are missing, and I'd like to insert sales as zero for those missing records:

sales = insert_zero_for_missing(sales_with_missing)
print(sales)

How can I implement the insert_zero_for_missing method?

unutbu · Accepted Answer

Set the month as the index,
reindex to add rows for the missing months,
call fillna to fill the missing values with zero, and then
reset the index (to make month a column again):

import numpy as np
import pandas as pd

month = list(range(1,4)) + list(range(6,13))
sales = np.array(month)*100
df = pd.DataFrame(dict(month=month, sales=sales))
print(df.set_index('month').reindex(range(1,13)).fillna(0).reset_index())

yields

    month  sales
0       1    100
1       2    200
2       3    300
3       4      0
4       5      0
5       6    600
6       7    700
7       8    800
8       9    900
9      10   1000
10     11   1100
11     12   1200

Insert zero to missing data in pandas.DataFrame

Answers (2)

Related Questions