Reputation: 807
I want to check in a column if a certain month of data is missing. My column has all the dates for two years between 2016 and 2017 except June 2017. This column is a purchase date column where when a purchase is made the date is recorded. Only for the month of June 2017, it has not recorded any purchases so I want to do a missing value check and report this missing month in the EDA for my project. The column dtype is datetime64[ns]
I have used the following code but it shows that all the months are missing as the value returned is true. The output should yeild all the missing months from this column in the format MM-YYYY
df1.reindex(pd.period_range(df1.booking_date.min(),
df1.booking_date.max(),
freq= 'M')).isnull().all(1)
The sample data set looks like following:
booking_date
2016-01-16 00:00:00
2016-02-14 00:00:00
2016-03-01 00:00:00
.....
2017-05-01 00:00:00
2017-07-10 00:00:00
so the june month is missing and I want to extract that
Upvotes: 1
Views: 1083
Reputation: 30971
To use reindex on a month periods the index must be of timestamp type, but your index is the default index (consecutive numbers).
To find "missing" months, take the following approach:
Set index to booking_date, then resample by months and compute size of each group:
s = df1.set_index('booking_date').resample('MS').size()
Then limit this result to elements (months) with value of 0 and take their indices (maybe as an "ordinary" list):
s[s == 0].index.tolist()
If you want these missing months just as periods (not starting dates of each month), run:
s[s == 0].index.to_period('M')
Upvotes: 1