Lynn
Lynn

Reputation: 4408

dataframe that reveals missing values

I have a dataframe, df, where I would like output to reveal what dates are missing as well as what ids are missing. I am verifying against ids of aa,bb,cc and dd.

Data

id  date    pwr
aa  Q1.22   10
aa  Q1.22   1
aa  Q2.22   1
aa  Q2.22   5
bb  Q1.22   5
bb  Q1.22   1
bb  Q2.22   1
bb  Q2.22   1
cc  Q1.22   2
cc  Q2.22   2

Desired

id  date     
aa  Q3.22  
aa  Q4.22     
bb  Q3.22   
bb  Q4.22    
cc  Q3.22   
cc  Q4.22   
dd  Q1.22   
dd  Q2.22   
dd  Q3.22   
dd  Q4.22  

Doing

pd.date_range(start = '2022-01-01', end = '2023-01-01' ).difference(df.index)

I believe I can do something like this. I unsure how to implement with quarters. I am still researching. Any suggestion is appreciated.

Upvotes: 0

Views: 38

Answers (1)

BENY
BENY

Reputation: 323386

Try with itertools then merge

import itertools
alldf = pd.DataFrame(list(itertools.product(['aa','bb','cc','dd'],['Q1.22','Q2.22','Q3.22','Q4.22'])),columns=['id','date'])
out = df.merge(alldf,indicator=True,how='right')
missing = out[out['_merge'] =='right_only']
missing
Out[159]: 
    id   date  pwr      _merge
4   aa  Q3.22  NaN  right_only
5   aa  Q4.22  NaN  right_only
10  bb  Q3.22  NaN  right_only
11  bb  Q4.22  NaN  right_only
14  cc  Q3.22  NaN  right_only
15  cc  Q4.22  NaN  right_only
16  dd  Q1.22  NaN  right_only
17  dd  Q2.22  NaN  right_only
18  dd  Q3.22  NaN  right_only
19  dd  Q4.22  NaN  right_only

Upvotes: 1

Related Questions