ecjb
ecjb

Reputation: 5449

Julia: how to compute a particular operation on certain columns of a Dataframe

I have the following Dataframe

using DataFrames, Statistics
df = DataFrame(name=["John", "Sally", "Kirk"], 
               age=[23., 42., 59.],
               children=[3,5,2], height = [180, 150, 170])

print(df)

3×4 DataFrame
│ Row │ name   │ age     │ children │ height │
│     │ String │ Float64 │ Int64    │ Int64  │
├─────┼────────┼─────────┼──────────┼────────┤
│ 1   │ John   │ 23.0    │ 3        │ 180    │
│ 2   │ Sally  │ 42.0    │ 5        │ 150    │
│ 3   │ Kirk   │ 59.0    │ 2        │ 170    │

I can compute the mean of a column as follow:

println(mean(df[:4]))
166.66666666666666

Now I want to get the mean of all the numeric column and tried this code:

x = [2,3,4]
for i in x
  print(mean(df[:x[i]]))
end

But got the following error message:

MethodError: no method matching getindex(::Symbol, ::Int64)

Stacktrace:
 [1] top-level scope at ./In[64]:3

How can I solve the problem?

Upvotes: 0

Views: 491

Answers (3)

Przemyslaw Szufel
Przemyslaw Szufel

Reputation: 42194

Here is a one-liner that actually selects all Number columns:

julia> mean.(eachcol(df[findall(x-> x<:Number, eltypes(df))]))
3-element Array{Float64,1}:
  41.333333333333336
   3.3333333333333335
 166.66666666666666

For many scenarios describe is actually more convenient:

julia> describe(df)
4×8 DataFrame
│ Row │ variable │ mean    │ min  │ median │ max   │ nunique │ nmissing │ eltype   │
│     │ Symbol   │ Union…  │ Any  │ Union… │ Any   │ Union…  │ Nothing  │ DataType │
├─────┼──────────┼─────────┼──────┼────────┼───────┼─────────┼──────────┼──────────┤
│ 1   │ name     │         │ John │        │ Sally │ 3       │          │ String   │
│ 2   │ age      │ 41.3333 │ 23.0 │ 42.0   │ 59.0  │         │          │ Float64  │
│ 3   │ children │ 3.33333 │ 2    │ 3.0    │ 5     │         │          │ Int64    │
│ 4   │ height   │ 166.667 │ 150  │ 170.0  │ 180   │         │          │ Int64    │

Upvotes: 2

hckr
hckr

Reputation: 5583

You are trying to access the DataFrame's column using an integer index specifying the column's position. You should just use the integer value without any : before i, which would create the symbol :i but you do not a have column named i.

x = [2,3,4]
for i in x
  println(mean(df[i])) # no need for `x[i]`
end

You can also index a DataFrame using a Symbol denoting the column's name.

x = [:age, :children, :height];

for c in x
    println(mean(df[c]))
end

You get the following error in your attempt because you are trying to access the ith index of the symbol :x, which is an undefined operation.

MethodError: no method matching getindex(::Symbol, ::Int64)

Note that :4 is just 4.

julia> :4
4

julia> typeof(:4)
Int64

Upvotes: 2

ecjb
ecjb

Reputation: 5449

In the question println(mean(df[4])) works as well (instead of println(mean(df[:4]))).

Hence we can write

x = [2,3,4]
for i in x
  println(mean(df[i]))
end

which works

Upvotes: 0

Related Questions