Reputation: 367
I have table:
import pandas as pd
df_initial = pd.DataFrame([
("2018-05-25", 18, 14),
("2018-06-04", 19, 16),
("2018-06-15", 19, 18),
("2018-06-24", 21, 20),
("2018-07-10", 23, 23),
("2018-07-20", 25, 25),
("2018-08-01", 27, 29),
("2018-08-10", 28, 32),
("2018-08-22", 29, 35),
("2018-09-03", 29, 37),
("2018-09-25", 31, 48),
("2018-10-17", 34, 55),
("2018-11-10", 38, 63),
("2018-11-11", 39, 64),
("2018-12-10", 48, 77),
("2018-12-11", 49, 78),
("2019-01-11", 57, 88),
("2019-02-10", 63, 103),
("2019-02-24", 67, 111),
("2019-03-10", 69, 113),
("2019-03-11", 70, 115),
("2019-04-10", 80, 149),
("2019-05-11", 88, 209)],
columns=["date", "col1", "col2"])
I need to add NaN rows to table for 10th of each month where is no data for 10th. To make table look like this:
Upvotes: 3
Views: 73
Reputation: 107
Here's my solution using strictly strings (without converting to date), but I am getting 29 rows (instead of 27 rows from your expected sample table):
# create YYYY_MM column for filtering
df_initial["YYYY_MM"] = df_initial["date"].str.slice(0, -3)
# filter dates that DO contain the 10th
df_filtered = df_initial[df_initial['date'].str.endswith("-10")]
# slice off "-10"
df_monthsToFilter = df_filtered["date"].str.slice(0, -3)
# filter out the extraneous
df_filtered2 = df_initial[~df_initial.YYYY_MM.isin(df_monthsToFilter)]
# create df to add data back in
df_toAdd = pd.DataFrame(df_filtered2["YYYY_MM"].unique(), columns=['YYYY_MM'])
df_toAdd['YYYY_MM'] = df_toAdd['YYYY_MM'].astype(str) + "-10"
df_toAdd = df_toAdd.rename(index=str, columns={"YYYY_MM": "date"})
df_initial = df_initial.append(df_toAdd)
# remove YYYY_MM column
df_initial = df_initial.drop(["YYYY_MM"], axis=1)
Upvotes: 0
Reputation: 323226
IIUC using strftime
find the Y-m out , then we crate all month for day 10 , then we filter it by not exit(~isin
) in original df, and drop_duplicates
, concat
back
#df.date=pd.to_datetime(df.date)
s=pd.to_datetime(df.date.dt.strftime('%Y-%m')+'-10').to_frame()
df=pd.concat([df,s.loc[~s.date.isin(df.date),].drop_duplicates()],sort=True).\
sort_values('date').\
reindex(columns=df.columns)
df
date col1 col2
19 2018-03-10 69.0 113.0
0 2018-05-10 NaN NaN
0 2018-05-25 18.0 14.0
1 2018-06-04 19.0 16.0
1 2018-06-10 NaN NaN
2 2018-06-15 19.0 18.0
3 2018-06-24 21.0 20.0
4 2018-07-10 23.0 23.0
5 2018-07-20 25.0 25.0
6 2018-08-01 27.0 29.0
7 2018-08-10 28.0 32.0
8 2018-08-22 29.0 35.0
9 2018-09-03 29.0 37.0
9 2018-09-10 NaN NaN
10 2018-09-25 31.0 48.0
11 2018-10-10 NaN NaN
11 2018-10-17 34.0 55.0
12 2018-11-10 38.0 63.0
13 2018-11-11 39.0 64.0
14 2018-12-10 48.0 77.0
15 2018-12-11 49.0 78.0
16 2019-01-10 NaN NaN
16 2019-01-11 57.0 88.0
17 2019-02-10 63.0 103.0
18 2019-02-24 67.0 111.0
20 2019-03-10 NaN NaN
20 2019-03-11 70.0 115.0
21 2019-04-10 80.0 149.0
22 2019-05-10 NaN NaN
22 2019-05-11 88.0 209.0
Upvotes: 2