Sanoj
Sanoj

Reputation: 300

overwrite whole year in case of missing data

I would like to manipulate a dataframe and fill it with NaN depending on a special condition: As soon as there is a value missing in a year, I want the whole year to be filled with NaN values. So lets suppose my dataframe looks like this and I am working with a datetime index:

           Score    
2005-12-29 NaN  
2005-12-30 NaN  
2005-12-31 20.3     
2006-01-02 30.9     
2006-01-02 14.9     
2006-01-03 14.1     
...
2006-12-31 14.8     
2007-01-01 14.7     
2007-01-02 NaN  

Now I would like to write a code which detects that in 2005 and 2007 there are values missing, and therefore overwrites them with NaN. Does anyone have any idea, tip, approach how to do this?

I was thinking that I could be selecting every year to then look if it has any NaNs. However I didn't really figure out how to select a year. I tried:

 start_date = "01-01"
 end_date = "12-31"

 mask = (df.index.strftime("%H") >= start_date) & (df.index.strftime("%H") <= end_date)

Upvotes: 0

Views: 50

Answers (2)

FObersteiner
FObersteiner

Reputation: 25684

group by year and set the year to NaN if it contains any NaN values:

import pandas as pd
import numpy as np

df = pd.DataFrame({'Score': [np.NaN, np.NaN, 20.3, 30.9, 14.9, 14.1, 14.8, 14.7, np.NaN]},
                  index=pd.to_datetime(['2005-12-29', '2005-12-30', '2005-12-31', 
                                        '2006-01-02', '2006-01-02', '2006-01-03', 
                                        '2006-12-31', '2007-01-01', '2007-01-02']))

df['Score'] = df.groupby(df.index.year).transform(lambda x: np.nan if x.isnull().values.any() else x)
# df['Score']
# 2005-12-29     NaN
# 2005-12-30     NaN
# 2005-12-31     NaN
# 2006-01-02    30.9
# 2006-01-02    14.9
# 2006-01-03    14.1
# 2006-12-31    14.8
# 2007-01-01     NaN
# 2007-01-02     NaN

to fill any month with NaNs that initially contains a NaN, use a pd.Grouper with monthly frequency to preserver the year component:

df['Score'] = df.groupby(pd.Grouper(freq="M")).transform(lambda x: np.nan if x.isnull().values.any() else x)

Upvotes: 2

jezrael
jezrael

Reputation: 863701

Use Series.mask with set missing values (default value) if at least one NaN for year tested by GroupBy.transform and Series.any:

df['Score'] = df['Score'].mask(df['Score'].isna().groupby(df.index.year).transform('any'))
print (df)
            Score
2005-12-29    NaN
2005-12-30    NaN
2005-12-31    NaN
2006-01-02   30.9
2006-01-02   14.9
2006-01-03   14.1
2006-12-31   14.8
2007-01-01    NaN
2007-01-02    NaN

How it working:

print (df['Score'].isna())
2005-12-29     True
2005-12-30     True
2005-12-31    False
2006-01-02    False
2006-01-02    False
2006-01-03    False
2006-12-31    False
2007-01-01    False
2007-01-02     True
Name: Score, dtype: bool

print (df['Score'].isna().groupby(df.index.year).transform('any'))
2005-12-29     True
2005-12-30     True
2005-12-31     True
2006-01-02    False
2006-01-02    False
2006-01-03    False
2006-12-31    False
2007-01-01     True
2007-01-02     True
Name: Score, dtype: bool

Upvotes: 2

Related Questions