Reputation: 421
Here is the data. For each person, it gives the monetary contribution they made for that month:
df.head()
| year_mo | name | contribution|
| 202001 | Fey | 30 |
| 202003 | Fey | 20 |
| 202004 | Fey | 10 |
| 202001 | Jay | 30 |
| 202003 | Jay | 20 |
I want to generate a continuous series of year_mo for each person from 202001-202006 where if the person didn't make a contribution, it would be 0 like this :
| year_mo | name | contribution|
| 202001 | Fey | 30 |
| 202002 | Fey | 0 |
| 202003 | Fey | 20 |
| 202004 | Fey | 10 |
| 202005 | Fey | 0 |
| 202006 | Fey | 0 |
| 202001 | Jay | 30 |
| 202002 | Jay | 0 |
| 202003 | Jay | 20 |
| 202004 | Jay | 0 |
| 202005 | Jay | 0 |
| 202006 | Jay | 0 |
In order to do this where I tried to do an outer join with the continuous year month df, I tried the following but didn't get the desired output:
years = list(range(2020,2021,1))
months= list(range(1,7,1))
rows_list = []
for yr in years:
for mn in months:
dict1 = {}
dict1 = {'key' :1, 'year_mo' : (yr*100)+mn}
rows_list.append(dict1)
yearmo_df = pd.DataFrame(rows_list)
df['key'] = 1
new_df = pd.merge(left=df, right=yearmo_df, left_on='year_mo' ,right_on='year_mo', how='outer')
Upvotes: 1
Views: 31
Reputation: 150735
We can use MultiIndex
and reindex
:
new_idx = pd.MultiIndex.from_product([np.arange(202001,202007), df['name'].unique()],
names=['year_mo','name'])
(df.set_index(['year_mo', 'name'])
.reindex(new_idx, fill_value=0)
.sort_index(level=['name','year_mo'])
.reset_index()
)
Another optino is pivot_table
followed by re_index
, and stack
:
(df.pivot_table(index='year_mo', columns='name', fill_value=0)
.reindex(np.arange(202001,202007), fill_value=0)
.stack()
.reset_index()
)
Output:
year_mo name contribution
0 202001 Fey 30
1 202002 Fey 0
2 202003 Fey 20
3 202004 Fey 10
4 202005 Fey 0
5 202006 Fey 0
6 202001 Jay 30
7 202002 Jay 0
8 202003 Jay 20
9 202004 Jay 0
10 202005 Jay 0
11 202006 Jay 0
Upvotes: 1