ketan
ketan

Reputation: 2904

How to find month gaps count from monthly data?

I have one dataframe like below:

name,date
AAA,201705
AAA,201706
AAA,201707
AAA,201708
AAA,201710
AAA,201711
AAA,201802
AAA,201803
AAA,201804
AAA,201805
AAA,201806
AAA,201807

In this dataframe two columns are available i.e. name and date. In the date column only year and month is available in yyyymm format.

In date column values 201709, 201712 and 201801 months are not available.

Need to check all the months are present or not. If any month is not available then need the ouput in below format:

name,start_date,end_date,count
AAA,201709,201709,1
AAA,201712,201801,2

I am trying to use pandas diff function

Upvotes: 2

Views: 256

Answers (1)

jezrael
jezrael

Reputation: 862591

Use asfreq:

#convert column to datetimes
df['date'] = pd.to_datetime(df['date'], format='%Y%m')
# get missing values by asfreq
a = df.set_index('date').groupby('name')['name'].apply(lambda x: x.asfreq('MS'))
#filter only NaNs consecutive rows
b = a.notnull().cumsum()[a.isnull()].reset_index(name='g')

#aggregate first, last and count
d = {'date':['first','last'],'name':['first', 'size']}
df = b.groupby('g').agg(d).reset_index(drop=True)
#data cleaning
df.columns = df.columns.map('_'.join)
df = df.rename(columns={'date_first':'start_date', 
                        'date_last':'end_date', 
                        'name_first':'name', 
                        'name_size':'count'})
print (df)
  start_date   end_date name  count
0 2017-09-01 2017-09-01  AAA      1
1 2017-12-01 2018-01-01  AAA      2

Detail:

print (a)
name  date      
AAA   2017-05-01    AAA
      2017-06-01    AAA
      2017-07-01    AAA
      2017-08-01    AAA
      2017-09-01    NaN
      2017-10-01    AAA
      2017-11-01    AAA
      2017-12-01    NaN
      2018-01-01    NaN
      2018-02-01    AAA
      2018-03-01    AAA
      2018-04-01    AAA
      2018-05-01    AAA
      2018-06-01    AAA
      2018-07-01    AAA
Name: name, dtype: object

Upvotes: 3

Related Questions