t.pellegrom
t.pellegrom

Reputation: 333

Fill missing combinations with "ones" in a groupby object

I have data in the following format.

date        group   ret
1986-01-31  1       1.3
1986-01-31  1       0.9
1986-01-31  2       1.4
1986-01-31  2       1.6
1986-01-31  2       1.5
1986-01-31  3       1.1
1986-02-28  2       1.3
1986-02-28  2       1.1

I want to get the average return per date and group, which I get by doing:

output = df.groupby(['date', 'group'])['ret'].mean() + 1 
output = output.reset_index()

Which gives the following output:

date        group   ret
1986-01-31  1       1.1
1986-01-31  2       1.5
1986-01-31  3       1.1
1986-02-28  2       1.2

However, since no "ret" was given at date 1986-02-28 for class 1 and 3, there is no row in the output for class 1 and 3 on this date. What I would like is that for any combination of dates and classes where no return is given in the original dataframe, this combination gets and output of "1" in the output. So, the required output is:

date        group   ret
1986-01-31  1       1.1
1986-01-31  2       1.5
1986-01-31  3       1.1
1986-02-28  1       1
1986-02-28  2       1.2
1986-02-28  3       1

What would be a good solution for this problem? Thanks in advance!

Upvotes: 2

Views: 451

Answers (3)

sammywemmy
sammywemmy

Reputation: 28709

You could use the complete function from pyjanitor to expose the explicitly missing values, and fillna with 1:

# pip install pyjanitor
import janitor
(df.groupby(['date', 'group'], as_index = False)
   .ret
   .mean()
   .complete(['date', 'group'])
   .fillna(1)
 )

         date  group  ret
0  1986-01-31      1  1.1
1  1986-01-31      2  1.5
2  1986-01-31      3  1.1
3  1986-02-28      1  1.0
4  1986-02-28      2  1.2
5  1986-02-28      3  1.0

Alternatively, you could convert the group column to a categorical dtype, all categories will be maintained during the groupby:

from pandas.api.types import CategoricalDtype
(df
 .astype({"group": CategoricalDtype(categories=df.group.unique())})
 .groupby(['date', 'group'], as_index = False)
 .ret
 .mean()
 .fillna(1)
 )

         date group  ret
0  1986-01-31     1  1.1
1  1986-01-31     2  1.5
2  1986-01-31     3  1.1
3  1986-02-28     1  1.0
4  1986-02-28     2  1.2
5  1986-02-28     3  1.0

Upvotes: 1

PieCot
PieCot

Reputation: 3639

You can reindex the result of the groupby and mean and fill the null values with ones:

output = df.groupby(['date', 'group'])['ret'].mean().reindex(
    pd.MultiIndex.from_product(
        (pd.date_range(df.date.min(), df.date.max(), freq='M'),
         sorted(df.group.unique())),
        names=['date', 'group'],
    )
).fillna(1).reset_index()

Here the result for the DataFrame in your question:

        date  group  ret
0 1986-01-31      1  1.1
1 1986-01-31      2  1.5
2 1986-01-31      3  1.1
3 1986-02-28      1  1.0
4 1986-02-28      2  1.2
5 1986-02-28      3  1.0

Upvotes: 2

BENY
BENY

Reputation: 323356

We can do pivot_table then stack

out = df.pivot_table(index='date',columns='group',values='ret',aggfunc = 'mean').fillna(1).stack().reset_index(name='value')
         date  group  value
0  1986-01-31      1    1.1
1  1986-01-31      2    1.5
2  1986-01-31      3    1.1
3  1986-02-28      1    1.0
4  1986-02-28      2    1.2
5  1986-02-28      3    1.0

Upvotes: 4

Related Questions