Reputation: 400
In R, I find the following very useful when dealing with many variables:
library(dplyr)
dat <- group_by(mtcars, cyl)
summarize(dat, across(c('mpg','disp'), sum), across(c('drat','wt','qsec'), mean))
# A tibble: 3 x 5
cyl disp hp drat wt
<dbl> <dbl> <dbl> <dbl> <dbl>
1 4 1156. 909 4.07 2.29
2 6 1283. 856 3.59 3.12
3 8 4943. 2929 3.23 4.00
Or even better, selecting with pseudo-regex
summarize(dat, across(ends_with('p'), sum), across(ends_with('t'), mean))
In pandas, the equivalent seems to pass variables one-by-one into a dictionary, eg from this gist:
group_agg = df.groupby("group1").agg({
"var1" : ["mean"],
"var2" : ["sum"],
"var3" : ["mean"]
})
Is there a less verbose way to do this operation in pandas, or with some other package?
Upvotes: 4
Views: 806
Reputation: 28659
For the first scenario, pandas concat
suffices :
dat = df.groupby("cyl")
pd.concat([dat[["mpg", "disp"]].sum(),
dat[["drat", "wt", "qsec"]].mean()],
axis=1)
For the regex/string processing part, verbose is unavoidable :
cols_p = [col for col in df.columns if col.endswith("p")]
cols_t = [col for col in df.columns if col.endswith("t")]
pd.concat((dat[cols_p].sum(), dat[cols_t].mean()), axis=1)
disp hp drat wt
cyl
4 1156.5 909 4.070909 2.285727
6 1283.2 856 3.585714 3.117143
8 4943.4 2929 3.229286 3.999214
Another option, which IMO is simpler, is to build the dictionary before aggregating -inspired by @RichieV:
aggs = {key:"mean"
if key.endswith("t")
else "sum"
for key in dat
if key.endswith(('p', 't'))}
dat.groupby('cyl').agg(aggs)
disp hp drat wt
cyl
4 1156.5 909 4.070909 2.285727
6 1283.2 856 3.585714 3.117143
8 4943.4 2929 3.229286 3.999214
Upvotes: 5
Reputation: 3825
Using datar
, it is very easy to transition your R code to python code without learning APIs of pandas:
>>> from datar import f
>>> from datar.dplyr import summarize, group_by, across, ends_with
>>> from datar.base import sum, mean
>>> from datar.datasets import mtcars
>>>
>>> dat = group_by(mtcars, f.cyl)
>>> dat >> summarize(across(ends_with('p'), sum), across(ends_with('t'), mean))
cyl disp hp drat wt
0 4 1156.5 909 4.070909 2.285727
1 6 1283.2 856 3.585714 3.117143
2 8 4943.4 2929 3.229286 3.999214
I am the author of the package. Feel free to submit issues if you have any questions.
Upvotes: 3
Reputation: 5183
Assuming you want this for a wide dataframe with many aggregation target columns, a dictionary comprehension could mimic the regex:
group_agg = df.groupby('group1').agg(
{var: 'sum' if var[-1]=='p' else 'mean' for var in cols})
In general you can build dict-comprehension with other conditions such as dtypes, str in string, count of regex patterns, etc.
Upvotes: 3