Reputation: 5235
I have a following kind of pandas.DataFrame:
sales_with_missing = pd.DataFrame({'month':[1,2,3,6,7,8,9,10,11,12],'code':[111]*10, 'sales':[np.random.randint(1500) for _ in np.arange(10)]})
You can see records for April and May are missing, and I'd like to insert sales as zero for those missing records:
sales = insert_zero_for_missing(sales_with_missing)
print(sales)
How can I implement the insert_zero_for_missing
method?
Upvotes: 3
Views: 4448
Reputation: 5414
# create a series of all months
all_months = pd.Series(data = range(1 , 13))
# get all missing months from your data frame in this example it will be 4 & 5
missing_months = all_months[~all_months.isin(sales_with_missing.month)]
# create a new data frame of missing months , it will be used in the next step to be concatenated to the original data frame
missing_df = pd.DataFrame({'month' : missing_months.values , 'code' : 111 , 'sales' : 0})
Out[36]:
code month sales
111 4 0
111 5 0
# then concatenate both data frames
pd.concat([sales_with_missing , missing_df]).sort_index(by = 'month')
Out[39]:
code month sales
111 1 1028
111 2 1163
111 3 961
111 4 0
111 5 0
111 6 687
111 7 31
111 8 607
111 9 1236
111 10 0863
111 11 11233
111 12 2780
Upvotes: 5
Reputation: 880717
month
as the index, reindex
to add rows for the missing months, fillna
to fill the missing values with zero, and then month
a column again):import numpy as np
import pandas as pd
month = list(range(1,4)) + list(range(6,13))
sales = np.array(month)*100
df = pd.DataFrame(dict(month=month, sales=sales))
print(df.set_index('month').reindex(range(1,13)).fillna(0).reset_index())
yields
month sales
0 1 100
1 2 200
2 3 300
3 4 0
4 5 0
5 6 600
6 7 700
7 8 800
8 9 900
9 10 1000
10 11 1100
11 12 1200
Upvotes: 6