raam
raam

Reputation: 241

How to calculate Quarterly difference and add missing Quarterly with count in python

I am having a data frame like this I have to get missing Quarterly value and count between them Same with Quarterly Missing count and fill the data frame is

year    Data    Id       number
    2019Q4   57170   A     1
    2019Q3   55150   A     1
    2019Q2   51109   A     1
    2019Q1   51109   A     1
    2018Q1   57170   B     5
    2018Q4   55150   B     5
    2017Q4   51109   C     7
    2017Q2   51109   C     7
    2017Q1   51109   C     7 

Id Start year end-year count

  B  2018Q2    2018Q3        2  5
  B  2017Q3    2018Q3        1  5

How can I achieve this using python panda

Upvotes: 1

Views: 455

Answers (1)

jezrael
jezrael

Reputation: 862581

Use:

#convert datetimes to quarter period
df['year'] = pd.to_datetime(df['year']).dt.to_period('Q')
#resample by start of months with asfreq
df1 = (df.set_index('year')
         .groupby('Id')['Id']
         .resample('Q')
         .asfreq()
         .rename('val')
         .reset_index())
print (df1)
   Id   year  val
0   A 2019Q1    A
1   A 2019Q2    A
2   A 2019Q3    A
3   A 2019Q4    A
4   B 2018Q1    B
5   B 2018Q2  NaN
6   B 2018Q3  NaN
7   B 2018Q4    B
8   C 2017Q1    C
9   C 2017Q2    C
10  C 2017Q3  NaN
11  C 2017Q4    C

m = df1['val'].notnull().rename('g')
#create index by cumulative sum for unique groups for consecutive NaNs
df1.index = m.cumsum()

#filter only NaNs row and aggregate first, last and count.
df2 = (df1[~m.values].groupby(['Id', 'g'])['year']
                     .agg(['first','last','size'])
                     .reset_index(level=1, drop=True)
                     .reset_index())
print (df2)
  Id  first   last  size
0  B 2018Q2 2018Q3     2
1  C 2017Q3 2017Q3     1

EDIT:

For new columns with same values add it to groupby:

#convert datetimes to quarter period
df['year'] = pd.to_datetime(df['year']).dt.to_period('Q')
#resample by start of months with asfreq
df1 = (df.set_index('year')
         .groupby(['Id','number'])['Id'] <- added number columns
         .resample('Q')
         .asfreq()
         .rename('val')
         .reset_index())
print (df1)
   Id  number   year  val
0   A       1 2019Q1    A
1   A       1 2019Q2    A
2   A       1 2019Q3    A
3   A       1 2019Q4    A
4   B       5 2018Q1    B
5   B       5 2018Q2  NaN
6   B       5 2018Q3  NaN
7   B       5 2018Q4    B
8   C       7 2017Q1    C
9   C       7 2017Q2    C
10  C       7 2017Q3  NaN
11  C       7 2017Q4    C

m = df1['val'].notnull().rename('g')
#create index by cumulative sum for unique groups for consecutive NaNs
df1.index = m.cumsum()

#filter only NaNs row and aggregate first, last and count.
df2 = (df1[~m.values].groupby(['Id', 'number', 'g'])['year']
                     .agg(['first','last','size'])  <- added number columns
                     .reset_index(level=1, drop=True)
                     .reset_index())
print (df2)
  Id  g  first   last  size
0  B  5 2018Q2 2018Q3     2
1  C  8 2017Q3 2017Q3     1

Upvotes: 1

Related Questions