MAC
MAC

Reputation: 1515

How to add row in pandas dataframe with None values to some columns

I have a data-frame which looks like:

 month  desc value
201911   a  1164
201912   a   971
202001   a  1125
202005   b  1549
202005   a  1038
202006   b  1244
202006   a  1475
202007   a   960

My month column ranges from 201911 to 202007. The column desc has categorical values. For each value of desc, I want to have all month data and if its not present I will the value column with None.

Expected Output:

month   desc value
201911   a  1164
201912   a   971
202002   a   None
202003   a   None
202004   a   None
202001   a  1125
202005   b  1549
202005   a  1038
202006   b  1244
202006   a  1475
202007   a   960
201911   b  None
201912   b  None
202001   b  None
202002   b  None
202003   b  None
202004   b  None
202007   b  None

Now for every desc value I have all the month.

Is there any smarter way to do this?

Upvotes: 1

Views: 340

Answers (1)

jezrael
jezrael

Reputation: 862511

You can use DataFrame.reindex by new MultiIndex created by date_range with minimal and maximal value from string:

df['month'] = pd.to_datetime(df['month'], format='%Y%m')
desc = df['desc'].unique()
d = pd.date_range(df['month'].min(), '2020-07-01', freq='MS')

df = df.set_index(['month', 'desc']) 

mux = pd.MultiIndex.from_product([d, desc], names=['month','desc'])
df = df.reindex(mux).reset_index()
df['month'] = df['month'].dt.strftime('%Y%m')
print (df)
     month desc   value
0   201911    a  1164.0
1   201911    b     NaN
2   201912    a   971.0
3   201912    b     NaN
4   202001    a  1125.0
5   202001    b     NaN
6   202002    a     NaN
7   202002    b     NaN
8   202003    a     NaN
9   202003    b     NaN
10  202004    a     NaN
11  202004    b     NaN
12  202005    a  1038.0
13  202005    b  1549.0
14  202006    a  1475.0
15  202006    b  1244.0
16  202007    a   960.0
17  202007    b     NaN

Upvotes: 1

Related Questions