Reputation: 125
I have a dataframe with date
, value
, and isoweek
fields like this:
date | value | isoweek
-----------------------------
2018-04-01 | 5 | 2018-13
2018-04-10 | 10 | 2018-15
2018-05-01 | 10 | 2018-18
where isoweek
is the year-week of the corresponding date. my goal is to iterate over the isoweeks, finding the isoweeks that dont exist in the data and insert a row into the dataframe with 0
as the value.
The expected output would look like this:
date | value | isoweek
-----------------------------
2018-04-01 | 5 | 2018-13
NaN | 0 | 2018-14
2018-04-10 | 10 | 2018-15
NaN | 0 | 2018-16
NaN | 0 | 2018-17
2018-05-01 | 10 | 2018-18
How can I iterate over the original dataframe, and find all the missing isoweeks in the data?
Upvotes: 2
Views: 182
Reputation: 30050
You can use pandas.date_range() to generate a list of dates from start
to end
per one week.
dates = pd.Series(pd.date_range(start=df['date'].min(), end=df['date'].max(), freq='W'))
isoweeks = (dates.dt.isocalendar().year.astype(str) + '-' + dates.dt.isocalendar().week.astype(str)).tolist()
max_isoweek = str(df['date'].max().isocalendar()[0]) + '-' + str(df['date'].max().isocalendar()[1])
if max_isoweek not in isoweeks:
isoweeks.append(max_isoweek)
This is to get all iso weeks between start
date and end
date.
Then you could merge the df into a helper dataframe to get what you want.
df = df.merge(pd.DataFrame({'isoweek': isoweeks}), how='right')
df['value'].fillna(0, inplace=True)
# print(df)
date value isoweek
0 2018-04-01 5.0 2018-13
1 NaT 0.0 2018-14
2 2018-04-10 10.0 2018-15
3 NaT 0.0 2018-16
4 NaT 0.0 2018-17
5 2018-05-01 10.0 2018-18
Upvotes: 2
Reputation: 13349
You can try using apply
:
def func(row):
year = (row.name)
r = row['isoweek'].str.split('-').str[1].astype(int)
min_week = min(r)
max_week = max(r)
val_range = range(min_week, max_week)
missing = (set(val_range) - set(r.values))
for mis_week in missing:
row = (row.append({'isoweek': f"{year}-{mis_week}", 'date': np.nan, 'value':0}, ignore_index=True))
return (row.sort_values(by='isoweek').reset_index(drop=True))
df.groupby(df.isoweek.str.split('-').str[0],as_index=False).apply(func)
date value isoweek
0 0 2018-04-01 5 2018-13
1 NaN 0 2018-14
2 2018-04-10 10 2018-15
3 NaN 0 2018-16
4 NaN 0 2018-17
5 2018-05-01 10 2018-18
Upvotes: 1
Reputation: 75120
Perhaps a bit verbose , but you can try with resample
after converting the isoweek to date:
s = pd.to_datetime(df['isoweek']+"-0",format='%Y-%W-%w')
u = df.set_index(s).resample("W").first()
iso_week = u.index.year.astype(str)+'-'+u.index.weekofyear.astype(str)
u['isoweek'] = u['isoweek'].fillna(pd.Series(iso_week,index=u.index))
out = u.fillna({"value":0}).reset_index(drop=True)
print(out)
date value isoweek
0 2018-04-01 5.0 2018-13
1 NaN 0.0 2018-14
2 2018-04-10 10.0 2018-15
3 NaN 0.0 2018-16
4 NaN 0.0 2018-17
5 2018-05-01 10.0 2018-18
Upvotes: 1