Reputation: 300
I would like to manipulate a dataframe and fill it with NaN depending on a special condition: As soon as there is a value missing in a year, I want the whole year to be filled with NaN values. So lets suppose my dataframe looks like this and I am working with a datetime index:
Score
2005-12-29 NaN
2005-12-30 NaN
2005-12-31 20.3
2006-01-02 30.9
2006-01-02 14.9
2006-01-03 14.1
...
2006-12-31 14.8
2007-01-01 14.7
2007-01-02 NaN
Now I would like to write a code which detects that in 2005 and 2007 there are values missing, and therefore overwrites them with NaN. Does anyone have any idea, tip, approach how to do this?
I was thinking that I could be selecting every year to then look if it has any NaNs. However I didn't really figure out how to select a year. I tried:
start_date = "01-01"
end_date = "12-31"
mask = (df.index.strftime("%H") >= start_date) & (df.index.strftime("%H") <= end_date)
Upvotes: 0
Views: 50
Reputation: 25684
group by year and set the year to NaN if it contains any NaN values:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Score': [np.NaN, np.NaN, 20.3, 30.9, 14.9, 14.1, 14.8, 14.7, np.NaN]},
index=pd.to_datetime(['2005-12-29', '2005-12-30', '2005-12-31',
'2006-01-02', '2006-01-02', '2006-01-03',
'2006-12-31', '2007-01-01', '2007-01-02']))
df['Score'] = df.groupby(df.index.year).transform(lambda x: np.nan if x.isnull().values.any() else x)
# df['Score']
# 2005-12-29 NaN
# 2005-12-30 NaN
# 2005-12-31 NaN
# 2006-01-02 30.9
# 2006-01-02 14.9
# 2006-01-03 14.1
# 2006-12-31 14.8
# 2007-01-01 NaN
# 2007-01-02 NaN
to fill any month with NaNs that initially contains a NaN, use a pd.Grouper
with monthly frequency to preserver the year component:
df['Score'] = df.groupby(pd.Grouper(freq="M")).transform(lambda x: np.nan if x.isnull().values.any() else x)
Upvotes: 2
Reputation: 863701
Use Series.mask
with set missing values (default value) if at least one NaN
for year tested by GroupBy.transform
and Series.any
:
df['Score'] = df['Score'].mask(df['Score'].isna().groupby(df.index.year).transform('any'))
print (df)
Score
2005-12-29 NaN
2005-12-30 NaN
2005-12-31 NaN
2006-01-02 30.9
2006-01-02 14.9
2006-01-03 14.1
2006-12-31 14.8
2007-01-01 NaN
2007-01-02 NaN
How it working:
print (df['Score'].isna())
2005-12-29 True
2005-12-30 True
2005-12-31 False
2006-01-02 False
2006-01-02 False
2006-01-03 False
2006-12-31 False
2007-01-01 False
2007-01-02 True
Name: Score, dtype: bool
print (df['Score'].isna().groupby(df.index.year).transform('any'))
2005-12-29 True
2005-12-30 True
2005-12-31 True
2006-01-02 False
2006-01-02 False
2006-01-03 False
2006-12-31 False
2007-01-01 True
2007-01-02 True
Name: Score, dtype: bool
Upvotes: 2