Francis Smart
Francis Smart

Reputation: 4055

Julia: Create summary values for column x for each unique value in column y of DataFrame

I would like to apply some functions such as mean and variance to column x of my DataFrame for each unique value in column y. I can imagine building a loop that manually subsets the DataFrame to accomplish my end but I am trying not to reinvent the wheel for something which is likely a common feature.

using DataFrames
mydf = DataFrame(y = [randstring(1) for i in 1:1000], x = rand(1000))
# I could imagine a function that looks like:
apply(function = mean, across = mydf[:x], by = mydf[:y])

Upvotes: 2

Views: 1027

Answers (1)

mbauman
mbauman

Reputation: 31342

You're right this is very common. Take a look at the split-apply-combine chapter in the documentation. There are several approaches here: you can either use the more general by function to specify exactly what columns you want to operate over, or you can use the handy aggregate function to use all the other columns and automatically name them sensibly:

julia> aggregate(mydf, :y, mean)
62×2 DataFrames.DataFrame
│ Row │ y   │ x_mean   │
├─────┼─────┼──────────┤
│ 1   │ "0" │ 0.454196 │
│ 2   │ "1" │ 0.541434 │
│ 3   │ "2" │ 0.36734  │
⋮

Upvotes: 2

Related Questions