Python: Pandas Dataframe, groupby but keeping otherwise missing values

Question

My dataframe is looking something like this:

       date    |  status  |  value
0      2020-01 |  true    |  3
1      2020-01 |  true    |  8
2      2020-02 |  true    |  5
3      2020-01 |  false   |  1
4      2020-01 |  false   |  11

For most functions I would call groupby...

df_group = df.groupby(['date', 'status'])[['value']].agg('sum').reset_index()

...and obtain this:

       date    |  status  |  value
0      2020-01 |  true    |  11
1      2020-02 |  true    |  5
2      2020-01 |  false   |  12

But what I'm trying to reach is a dataframe, where also the 'missing' values are listed, like this:

       date    |  status  |  value
0      2020-01 |  true    |  11
1      2020-02 |  true    |  5
2      2020-01 |  false   |  12
3      2020-02 |  false   |  0

Is groupby even the correct funtion?

jezrael · Accepted Answer

Use Series.unstack with DataFrame.stack, also remove [[]] and instead agg is possible use only sum:

df_group = (df.groupby(['date', 'status'])['value']
              .sum()
              .unstack(fill_value=0)
              .stack()
              .reset_index(name='value'))
print (df_group)
      date  status  value
0  2020-01   False     12
1  2020-01    True     11
2  2020-02   False      0
3  2020-02    True      5

Python: Pandas Dataframe, groupby but keeping otherwise missing values

Answers (1)

Related Questions