Denis
Denis

Reputation: 367

How to add missing rows in table for 10th of each month?

I have table:

import pandas as pd
df_initial = pd.DataFrame([
("2018-05-25", 18, 14),
("2018-06-04", 19, 16),
("2018-06-15", 19, 18),
("2018-06-24", 21, 20),
("2018-07-10", 23, 23),
("2018-07-20", 25, 25),
("2018-08-01", 27, 29),
("2018-08-10", 28, 32),
("2018-08-22", 29, 35),
("2018-09-03", 29, 37),
("2018-09-25", 31, 48),
("2018-10-17", 34, 55),
("2018-11-10", 38, 63),
("2018-11-11", 39, 64),
("2018-12-10", 48, 77),
("2018-12-11", 49, 78),
("2019-01-11", 57, 88),
("2019-02-10", 63, 103),
("2019-02-24", 67, 111),
("2019-03-10", 69, 113),
("2019-03-11", 70, 115),
("2019-04-10", 80, 149),
("2019-05-11", 88, 209)],
columns=["date", "col1", "col2"])

Initial table

I need to add NaN rows to table for 10th of each month where is no data for 10th. To make table look like this:

enter image description here

Upvotes: 3

Views: 73

Answers (2)

Beauregard D
Beauregard D

Reputation: 107

Here's my solution using strictly strings (without converting to date), but I am getting 29 rows (instead of 27 rows from your expected sample table):

# create YYYY_MM column for filtering
df_initial["YYYY_MM"] = df_initial["date"].str.slice(0, -3)

# filter dates that DO contain the 10th
df_filtered = df_initial[df_initial['date'].str.endswith("-10")]

# slice off "-10"
df_monthsToFilter = df_filtered["date"].str.slice(0, -3)

# filter out the extraneous
df_filtered2 = df_initial[~df_initial.YYYY_MM.isin(df_monthsToFilter)]

# create df to add data back in
df_toAdd = pd.DataFrame(df_filtered2["YYYY_MM"].unique(), columns=['YYYY_MM'])
df_toAdd['YYYY_MM'] = df_toAdd['YYYY_MM'].astype(str) + "-10"
df_toAdd = df_toAdd.rename(index=str, columns={"YYYY_MM": "date"})
df_initial = df_initial.append(df_toAdd)

# remove YYYY_MM column
df_initial = df_initial.drop(["YYYY_MM"], axis=1)

Upvotes: 0

BENY
BENY

Reputation: 323226

IIUC using strftime find the Y-m out , then we crate all month for day 10 , then we filter it by not exit(~isin) in original df, and drop_duplicates , concat back

#df.date=pd.to_datetime(df.date)
s=pd.to_datetime(df.date.dt.strftime('%Y-%m')+'-10').to_frame()
df=pd.concat([df,s.loc[~s.date.isin(df.date),].drop_duplicates()],sort=True).\
      sort_values('date').\
         reindex(columns=df.columns)
df
         date  col1   col2
19 2018-03-10  69.0  113.0
0  2018-05-10   NaN    NaN
0  2018-05-25  18.0   14.0
1  2018-06-04  19.0   16.0
1  2018-06-10   NaN    NaN
2  2018-06-15  19.0   18.0
3  2018-06-24  21.0   20.0
4  2018-07-10  23.0   23.0
5  2018-07-20  25.0   25.0
6  2018-08-01  27.0   29.0
7  2018-08-10  28.0   32.0
8  2018-08-22  29.0   35.0
9  2018-09-03  29.0   37.0
9  2018-09-10   NaN    NaN
10 2018-09-25  31.0   48.0
11 2018-10-10   NaN    NaN
11 2018-10-17  34.0   55.0
12 2018-11-10  38.0   63.0
13 2018-11-11  39.0   64.0
14 2018-12-10  48.0   77.0
15 2018-12-11  49.0   78.0
16 2019-01-10   NaN    NaN
16 2019-01-11  57.0   88.0
17 2019-02-10  63.0  103.0
18 2019-02-24  67.0  111.0
20 2019-03-10   NaN    NaN
20 2019-03-11  70.0  115.0
21 2019-04-10  80.0  149.0
22 2019-05-10   NaN    NaN
22 2019-05-11  88.0  209.0

Upvotes: 2

Related Questions