AvocadoToast
AvocadoToast

Reputation: 63

Sum of Julia Dataframe column where values of another column are in a list

How do I make a line of code that works for Julia to sum the values of col2 where the values of col1 that are in list ? I'm pretty new to Julia and trying the following lines prints out the error Exception has occurred: DimensionMismatch DimensionMismatch: arrays could not be broadcast to a common size; got a dimension with lengths 10 and 3

total_sum = sum(df[ismember(df[:, :col1], list), :col2])

Upvotes: 0

Views: 162

Answers (3)

Andre Wildberg
Andre Wildberg

Reputation: 19191

Not exactly sure if this is what you're asking but try intersect

julia> using DataFrames

julia> df = DataFrame(a = 1:5, b = 2:6)
5×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      2
   2 │     2      3
   3 │     3      4
   4 │     4      5
   5 │     5      6

julia> list = collect(3:10);

julia> sum(df.b[intersect(df.a, list)])
15

Upvotes: 0

Przemyslaw Szufel
Przemyslaw Szufel

Reputation: 42264

Depending on what you want to do filter! is also worth knowing (using code form Dan Getz's answer):

julia> sum(filter!(:x1 => x1 -> x1 ∈ [2,3], df).x2)
13

Upvotes: 1

Dan Getz
Dan Getz

Reputation: 18227

One way could be:

julia> df = DataFrame(reshape(1:12,4,3),:auto)
4×3 DataFrame
 Row │ x1     x2     x3    
     │ Int64  Int64  Int64 
─────┼─────────────────────
   1 │     1      5      9
   2 │     2      6     10
   3 │     3      7     11
   4 │     4      8     12

julia> list = [2,3]
2-element Vector{Int64}:
 2
 3

julia> sum(df.x2[df.x1 .∈ Ref(list)])
13

Uses broadcasting on in (how ismember is written in Julia) which can also be written as . Ref(list) is used to prevent broadcasting over list.

Upvotes: 2

Related Questions