Loutfi
Loutfi

Reputation: 63

Pandas datetime questions: How to Insert missing weekends into an existing dates column in a dataframe in python

I hope you can help me with this: Trying to append missing weekends to the df['StartDate'] column and show the rest of the columns with the data except for Hours to show 0 or NaN

I don't need to see every single missing date between each displayed date in df['StartDate']. Need to only add the weekends 'Saturday' and 'Sunday' where ever they are missing.

Original Dataframe:

EmployeeId StartDate weekday Hours
111       1/20/2017   Friday    6
111       1/25/2017   Wednesday 5
111       1/30/2017   Monday    2

Final output would like this; Desired Final output

EmployeeId StartDate weekday Hours
111       1/20/2017   Friday    6
111       1/21/2017   Saturday NaN 
111       1/22/2017   Sunday   NaN
111       1/25/2017   Wednesday 5
111       1/28/2017   Saturday NaN
111       1/29/2017   Sunday   NaN
111       1/30/2017   Monday   2

Upvotes: 2

Views: 1225

Answers (1)

Umar.H
Umar.H

Reputation: 23099

One way is to create a separate data frame with the min and max values from your dataframe and just concatenate both frames together after filtering on weekends, we can handle duplicate values by dropping them and setting keep = 'first' which will keep the values from your first df.

s = pd.DataFrame(
    {"StartDate": pd.date_range(df.StartDate.min(), df.StartDate.max(), freq="D")}
)

s["weekday"] = s.StartDate.dt.day_name()

s = s.loc[s["weekday"].isin(["Saturday", "Sunday"])]

df_new = (
    pd.concat([df, s], sort=False)
    .drop_duplicates(keep="first")
    .sort_values("StartDate")
)
print(df_new)
   EmployeeId  StartDate    weekday  Hours
0       111.0 2017-01-20     Friday    6.0
1         NaN 2017-01-21   Saturday    NaN
2         NaN 2017-01-22     Sunday    NaN
1       111.0 2017-01-25  Wednesday    5.0
8         NaN 2017-01-28   Saturday    NaN
9         NaN 2017-01-29     Sunday    NaN
2       111.0 2017-01-30     Monday    2.0

to fill in NaN Employee IDs with the ones above them you can use fillna and ffill

df_new['EmployeeId'] = df_new['EmployeeId'].fillna(df_new['EmployeeId'].ffill())
print(df_new)
    EmployeeId  StartDate    weekday  Hours
0       111.0 2017-01-20     Friday    6.0
1       111.0 2017-01-21   Saturday    NaN
2       111.0 2017-01-22     Sunday    NaN
1       111.0 2017-01-25  Wednesday    5.0
8       111.0 2017-01-28   Saturday    NaN
9       111.0 2017-01-29     Sunday    NaN
2       111.0 2017-01-30     Monday    2.0

Upvotes: 3

Related Questions