Antonello
Antonello

Reputation: 6423

How to let the function aggregate "ignore" columns?

Let's say I have a dataframe with several categorical dimensions and a "value" dimension and I want to aggregate by some of them, ignoring the others.

In Julia DataFrames there is the function aggregate, but if I let out some categarical values I get an error, as it tries to apply the function (here, a sum) also to them instead of just ignoring them:

In:

using DataArrays, DataFrames
df = DataFrame(
  colour = ["green","blue","white","green","green"],
  shape  = ["circle", "triangle", "square","square","circle"],
  border = ["dotted", "line", "line", "line", "dotted"],
  area   = [1.1, 2.3, 3.1, 4.2, 5.2])

Out:

    colour  shape       border  area
1   green   circle      dotted  1.1
2   blue    triangle    line    2.3
3   white   square      line    3.1
4   green   square      line    4.2
5   green   circle      dotted  5.2

In:

aggregate(df,[:colour,:shape, :border],sum) # Ok
aggregate(df,[:colour,:shape],sum) # what I would like, ignoring border column

Out:

LoadError: MethodError: no method matching +(::String, ::String)

Obviously I may just remove the extra columns before the aggregation, but maybe there is a way to do it in a single passage ?

Upvotes: 3

Views: 266

Answers (1)

AndreiR
AndreiR

Reputation: 314

from https://juliastats.github.io/DataFrames.jl/split_apply_combine/

by(df, [:colour,:shape]) do df
    DataFrame(m = sum(df[:area]))
end

Upvotes: 3

Related Questions