Reputation: 333
I have data in the following format.
date group ret
1986-01-31 1 1.3
1986-01-31 1 0.9
1986-01-31 2 1.4
1986-01-31 2 1.6
1986-01-31 2 1.5
1986-01-31 3 1.1
1986-02-28 2 1.3
1986-02-28 2 1.1
I want to get the average return per date and group, which I get by doing:
output = df.groupby(['date', 'group'])['ret'].mean() + 1
output = output.reset_index()
Which gives the following output:
date group ret
1986-01-31 1 1.1
1986-01-31 2 1.5
1986-01-31 3 1.1
1986-02-28 2 1.2
However, since no "ret" was given at date 1986-02-28 for class 1 and 3, there is no row in the output for class 1 and 3 on this date. What I would like is that for any combination of dates and classes where no return is given in the original dataframe, this combination gets and output of "1" in the output. So, the required output is:
date group ret
1986-01-31 1 1.1
1986-01-31 2 1.5
1986-01-31 3 1.1
1986-02-28 1 1
1986-02-28 2 1.2
1986-02-28 3 1
What would be a good solution for this problem? Thanks in advance!
Upvotes: 2
Views: 451
Reputation: 28709
You could use the complete function from pyjanitor to expose the explicitly missing values, and fillna with 1
:
# pip install pyjanitor
import janitor
(df.groupby(['date', 'group'], as_index = False)
.ret
.mean()
.complete(['date', 'group'])
.fillna(1)
)
date group ret
0 1986-01-31 1 1.1
1 1986-01-31 2 1.5
2 1986-01-31 3 1.1
3 1986-02-28 1 1.0
4 1986-02-28 2 1.2
5 1986-02-28 3 1.0
Alternatively, you could convert the group
column to a categorical dtype, all categories will be maintained during the groupby:
from pandas.api.types import CategoricalDtype
(df
.astype({"group": CategoricalDtype(categories=df.group.unique())})
.groupby(['date', 'group'], as_index = False)
.ret
.mean()
.fillna(1)
)
date group ret
0 1986-01-31 1 1.1
1 1986-01-31 2 1.5
2 1986-01-31 3 1.1
3 1986-02-28 1 1.0
4 1986-02-28 2 1.2
5 1986-02-28 3 1.0
Upvotes: 1
Reputation: 3639
You can reindex the result of the groupby
and mean
and fill the null values with ones:
output = df.groupby(['date', 'group'])['ret'].mean().reindex(
pd.MultiIndex.from_product(
(pd.date_range(df.date.min(), df.date.max(), freq='M'),
sorted(df.group.unique())),
names=['date', 'group'],
)
).fillna(1).reset_index()
Here the result for the DataFrame in your question:
date group ret
0 1986-01-31 1 1.1
1 1986-01-31 2 1.5
2 1986-01-31 3 1.1
3 1986-02-28 1 1.0
4 1986-02-28 2 1.2
5 1986-02-28 3 1.0
Upvotes: 2
Reputation: 323356
We can do pivot_table
then stack
out = df.pivot_table(index='date',columns='group',values='ret',aggfunc = 'mean').fillna(1).stack().reset_index(name='value')
date group value
0 1986-01-31 1 1.1
1 1986-01-31 2 1.5
2 1986-01-31 3 1.1
3 1986-02-28 1 1.0
4 1986-02-28 2 1.2
5 1986-02-28 3 1.0
Upvotes: 4