user9894926
user9894926

Reputation: 23

Julia - How to aggregate many columns by group

Working with Julia 1.0

I am trying to aggregate (in this case mean-center) several columns by group and looking for a way to loop over the columns as opposed to writing all column names explicitly. The below works but I am looking for more succinct syntax for cases where I have many columns.

using DataFrames, Statistics
dd=DataFrame(A=["aa";"aa";"bb";"bb"], B=[1.0;2.0;3.0;4.0], C=[5.0;5.0;10.0;10.0])

by(dd, :A, df -> DataFrame(bm = df[:B].-mean(df[:B]), cm = df[:C].-mean(df[:C])))

Is there a way to loop over [:B, :C] and not write the statement separately for each?

Upvotes: 1

Views: 775

Answers (1)

Bogumił Kamiński
Bogumił Kamiński

Reputation: 69949

You can use aggregate:

julia> centered(col) = col .- mean(col)
centered (generic function with 1 method)

julia> aggregate(dd, :A, centered)
4×3 DataFrame
│ Row │ A      │ B_centered │ C_centered │
│     │ String │ Float64    │ Float64    │
├─────┼────────┼────────────┼────────────┤
│ 1   │ aa     │ -0.5       │ 0.0        │
│ 2   │ aa     │ 0.5        │ 0.0        │
│ 3   │ bb     │ -0.5       │ 0.0        │
│ 4   │ bb     │ 0.5        │ 0.0        │

Note that function name is used as a suffix. If you need more customized suffixes use by and pass it a more fancy third argument that iterates over passed columns giving them appropriate names.

Upvotes: 3

Related Questions