Reputation: 129
I created this simple example to illustrate my issue:
x = pd.DataFrame({"int_var1": range(3), "int_var2": range(3, 6), "cat_var": pd.Categorical(["a", "b", "a"]), "value": [0.1, 0.2, 0.3]})
it yields this DataFrame:
int_var1 int_var2 cat_var value
0 3 a 0.1
1 4 b 0.2
2 5 a 0.3
where the first two columns are integers, the third column is categorical with two levels, and the fourth column is floats. The issue is that when I try to use groupby followed by agg it seems I only have two options, either I can show no unobserved values like so:
x.groupby(['int_var1', 'int_var2', 'cat_var'], observed = True).agg({"value": "sum"}).fillna(0)
int_var1 int_var2 cat_var value
0 3 a 0.1
1 4 b 0.2
2 5 a 0.3
or I can show unobserved values for all grouping variables like so:
x.groupby(['int_var1', 'int_var2', 'cat_var'], observed = False).agg({"value": "sum"}).fillna(0)
int_var1 int_var2 cat_var value
0 3 a 0.1
b 0.0
4 a 0.0
b 0.0
5 a 0.0
b 0.0
1 3 a 0.0
b 0.0
4 a 0.0
b 0.2
5 a 0.0
b 0.0
2 3 a 0.0
b 0.0
4 a 0.0
b 0.0
5 a 0.3
b 0.0
Is there a way to show unobserved values for the categorical variables only and not every possible permutation of all grouping variables?
Upvotes: 0
Views: 171
Reputation: 150735
You can unstack
the level of interest, cat_var
in this case:
(x.groupby(['int_var1', 'int_var2', 'cat_var'],observed=True)
.agg({'value':'sum'})
.unstack('cat_var',fill_value=0)
)
Output:
value
cat_var a b
int_var1 int_var2
0 3 0.1 0.0
1 4 0.0 0.2
2 5 0.3 0.0
Upvotes: 1