Reputation: 1515
I have a data-frame which looks like:
month desc value
201911 a 1164
201912 a 971
202001 a 1125
202005 b 1549
202005 a 1038
202006 b 1244
202006 a 1475
202007 a 960
My month column ranges from 201911 to 202007
. The column desc
has categorical values.
For each value of desc
, I want to have all month data and if its not present I will the value
column with None
.
Expected Output:
month desc value
201911 a 1164
201912 a 971
202002 a None
202003 a None
202004 a None
202001 a 1125
202005 b 1549
202005 a 1038
202006 b 1244
202006 a 1475
202007 a 960
201911 b None
201912 b None
202001 b None
202002 b None
202003 b None
202004 b None
202007 b None
Now for every desc value I have all the month.
Is there any smarter way to do this?
Upvotes: 1
Views: 340
Reputation: 862511
You can use DataFrame.reindex
by new MultiIndex
created by date_range
with minimal and maximal value from string
:
df['month'] = pd.to_datetime(df['month'], format='%Y%m')
desc = df['desc'].unique()
d = pd.date_range(df['month'].min(), '2020-07-01', freq='MS')
df = df.set_index(['month', 'desc'])
mux = pd.MultiIndex.from_product([d, desc], names=['month','desc'])
df = df.reindex(mux).reset_index()
df['month'] = df['month'].dt.strftime('%Y%m')
print (df)
month desc value
0 201911 a 1164.0
1 201911 b NaN
2 201912 a 971.0
3 201912 b NaN
4 202001 a 1125.0
5 202001 b NaN
6 202002 a NaN
7 202002 b NaN
8 202003 a NaN
9 202003 b NaN
10 202004 a NaN
11 202004 b NaN
12 202005 a 1038.0
13 202005 b 1549.0
14 202006 a 1475.0
15 202006 b 1244.0
16 202007 a 960.0
17 202007 b NaN
Upvotes: 1