E. Lutins
E. Lutins

Reputation: 125

Iterating over Isoweeks pandas

I have a dataframe with date, value, and isoweek fields like this:

date       | value | isoweek 
-----------------------------
2018-04-01 | 5     | 2018-13
2018-04-10 | 10    | 2018-15
2018-05-01 | 10    | 2018-18

where isoweek is the year-week of the corresponding date. my goal is to iterate over the isoweeks, finding the isoweeks that dont exist in the data and insert a row into the dataframe with 0 as the value.

The expected output would look like this:

date       | value | isoweek 
-----------------------------
2018-04-01 | 5     | 2018-13
NaN        | 0     | 2018-14
2018-04-10 | 10    | 2018-15
NaN        | 0     | 2018-16
NaN        | 0     | 2018-17
2018-05-01 | 10    | 2018-18

How can I iterate over the original dataframe, and find all the missing isoweeks in the data?

Upvotes: 2

Views: 182

Answers (3)

Ynjxsjmh
Ynjxsjmh

Reputation: 30050

You can use pandas.date_range() to generate a list of dates from start to end per one week.

dates = pd.Series(pd.date_range(start=df['date'].min(), end=df['date'].max(), freq='W'))
isoweeks = (dates.dt.isocalendar().year.astype(str) + '-' + dates.dt.isocalendar().week.astype(str)).tolist()
max_isoweek = str(df['date'].max().isocalendar()[0]) + '-' + str(df['date'].max().isocalendar()[1])

if max_isoweek not in isoweeks:
    isoweeks.append(max_isoweek)

This is to get all iso weeks between start date and end date.

Then you could merge the df into a helper dataframe to get what you want.

df = df.merge(pd.DataFrame({'isoweek': isoweeks}), how='right')
df['value'].fillna(0, inplace=True)
# print(df)

        date  value  isoweek
0 2018-04-01    5.0  2018-13
1        NaT    0.0  2018-14
2 2018-04-10   10.0  2018-15
3        NaT    0.0  2018-16
4        NaT    0.0  2018-17
5 2018-05-01   10.0  2018-18

Upvotes: 2

Pygirl
Pygirl

Reputation: 13349

You can try using apply:

def func(row):
    year = (row.name)
    r = row['isoweek'].str.split('-').str[1].astype(int)
    min_week = min(r)
    max_week = max(r)
    val_range = range(min_week, max_week)
    missing = (set(val_range) - set(r.values))
    for mis_week in missing:
        row = (row.append({'isoweek': f"{year}-{mis_week}", 'date': np.nan, 'value':0}, ignore_index=True))
    return (row.sort_values(by='isoweek').reset_index(drop=True))

df.groupby(df.isoweek.str.split('-').str[0],as_index=False).apply(func)

        date        value   isoweek
 0  0   2018-04-01  5       2018-13
    1   NaN         0       2018-14
    2   2018-04-10  10      2018-15
    3   NaN         0       2018-16
    4   NaN         0       2018-17
    5   2018-05-01  10      2018-18

Upvotes: 1

anky
anky

Reputation: 75120

Perhaps a bit verbose , but you can try with resample after converting the isoweek to date:

s = pd.to_datetime(df['isoweek']+"-0",format='%Y-%W-%w')
u = df.set_index(s).resample("W").first()

iso_week = u.index.year.astype(str)+'-'+u.index.weekofyear.astype(str)
u['isoweek'] = u['isoweek'].fillna(pd.Series(iso_week,index=u.index))
out = u.fillna({"value":0}).reset_index(drop=True)

print(out)

         date  value  isoweek
0  2018-04-01    5.0  2018-13
1         NaN    0.0  2018-14
2  2018-04-10   10.0  2018-15
3         NaN    0.0  2018-16
4         NaN    0.0  2018-17
5  2018-05-01   10.0  2018-18

Upvotes: 1

Related Questions