Reputation: 241
I am having a data frame like this I have to get missing Quarterly value and count between them Same with Quarterly Missing count and fill the data frame is
year Data Id number
2019Q4 57170 A 1
2019Q3 55150 A 1
2019Q2 51109 A 1
2019Q1 51109 A 1
2018Q1 57170 B 5
2018Q4 55150 B 5
2017Q4 51109 C 7
2017Q2 51109 C 7
2017Q1 51109 C 7
Id Start year end-year count
B 2018Q2 2018Q3 2 5
B 2017Q3 2018Q3 1 5
How can I achieve this using python panda
Upvotes: 1
Views: 455
Reputation: 862581
Use:
#convert datetimes to quarter period
df['year'] = pd.to_datetime(df['year']).dt.to_period('Q')
#resample by start of months with asfreq
df1 = (df.set_index('year')
.groupby('Id')['Id']
.resample('Q')
.asfreq()
.rename('val')
.reset_index())
print (df1)
Id year val
0 A 2019Q1 A
1 A 2019Q2 A
2 A 2019Q3 A
3 A 2019Q4 A
4 B 2018Q1 B
5 B 2018Q2 NaN
6 B 2018Q3 NaN
7 B 2018Q4 B
8 C 2017Q1 C
9 C 2017Q2 C
10 C 2017Q3 NaN
11 C 2017Q4 C
m = df1['val'].notnull().rename('g')
#create index by cumulative sum for unique groups for consecutive NaNs
df1.index = m.cumsum()
#filter only NaNs row and aggregate first, last and count.
df2 = (df1[~m.values].groupby(['Id', 'g'])['year']
.agg(['first','last','size'])
.reset_index(level=1, drop=True)
.reset_index())
print (df2)
Id first last size
0 B 2018Q2 2018Q3 2
1 C 2017Q3 2017Q3 1
EDIT:
For new columns with same values add it to groupby
:
#convert datetimes to quarter period
df['year'] = pd.to_datetime(df['year']).dt.to_period('Q')
#resample by start of months with asfreq
df1 = (df.set_index('year')
.groupby(['Id','number'])['Id'] <- added number columns
.resample('Q')
.asfreq()
.rename('val')
.reset_index())
print (df1)
Id number year val
0 A 1 2019Q1 A
1 A 1 2019Q2 A
2 A 1 2019Q3 A
3 A 1 2019Q4 A
4 B 5 2018Q1 B
5 B 5 2018Q2 NaN
6 B 5 2018Q3 NaN
7 B 5 2018Q4 B
8 C 7 2017Q1 C
9 C 7 2017Q2 C
10 C 7 2017Q3 NaN
11 C 7 2017Q4 C
m = df1['val'].notnull().rename('g')
#create index by cumulative sum for unique groups for consecutive NaNs
df1.index = m.cumsum()
#filter only NaNs row and aggregate first, last and count.
df2 = (df1[~m.values].groupby(['Id', 'number', 'g'])['year']
.agg(['first','last','size']) <- added number columns
.reset_index(level=1, drop=True)
.reset_index())
print (df2)
Id g first last size
0 B 5 2018Q2 2018Q3 2
1 C 8 2017Q3 2017Q3 1
Upvotes: 1