Reputation: 6423
Let's say I have a dataframe with several categorical dimensions and a "value" dimension and I want to aggregate by some of them, ignoring the others.
In Julia DataFrames there is the function aggregate, but if I let out some categarical values I get an error, as it tries to apply the function (here, a sum) also to them instead of just ignoring them:
In:
using DataArrays, DataFrames
df = DataFrame(
colour = ["green","blue","white","green","green"],
shape = ["circle", "triangle", "square","square","circle"],
border = ["dotted", "line", "line", "line", "dotted"],
area = [1.1, 2.3, 3.1, 4.2, 5.2])
Out:
colour shape border area
1 green circle dotted 1.1
2 blue triangle line 2.3
3 white square line 3.1
4 green square line 4.2
5 green circle dotted 5.2
In:
aggregate(df,[:colour,:shape, :border],sum) # Ok
aggregate(df,[:colour,:shape],sum) # what I would like, ignoring border column
Out:
LoadError: MethodError: no method matching +(::String, ::String)
Obviously I may just remove the extra columns before the aggregation, but maybe there is a way to do it in a single passage ?
Upvotes: 3
Views: 266
Reputation: 314
from https://juliastats.github.io/DataFrames.jl/split_apply_combine/
by(df, [:colour,:shape]) do df
DataFrame(m = sum(df[:area]))
end
Upvotes: 3