Malthus
Malthus

Reputation: 578

Taking an expression as an argument in Julia function

I'm trying to implement OLS regression in Julia as a learning exercise. A feature I would like to have is excepting a formula as an argument (e.g. 'formula = Y ~ x1 + x2', where Y, x1, and x2 are columns in a DataFrame). Here is an existing example.

How do I "map" the formula/expression to the correct DataFrame columns?

Upvotes: 2

Views: 270

Answers (3)

Alain
Alain

Reputation: 883

Here's a minimal example using the boston dataset from ISLR, regressing medv on lstat. (Check pg. 111 of ISLR if you want verify that the weight vector is correct)

julia> using DataFrames, RDatasets
julia> df = dataset("MASS", "Boston")
julia> fm = @formula(MedV ~ LStat)

julia> mf = ModelFrame(fm, df)
julia> X = ModelMatrix(mf).m
julia> y = Array(df[:MedV])
julia> w = X \ y

2-element Array{Float64,1}:
34.5538  
-0.950049

For more information: http://dataframesjl.readthedocs.io/en/latest/formulas.html

Upvotes: 0

Jeremy
Jeremy

Reputation: 631

Use an anonymous function as an input.

julia > using DataFrames
julia > f = (x,y) -> x[:A] .* y[:B] # Anonymous function
julia > x = DataFrame(A = 6)
julia > y = DataFrame(B = 7)
julia > function OSL(x::DataFrame,y::DataFrame,f::Function);return f(x,y);end
julia > OSL(x,y,f)
1-element DataArrays.DataArray{Int64,1}:
  42

Upvotes: 1

aviks
aviks

Reputation: 2097

Formulas in the Julia statistics packages are implemented as a macro. A macro is defined for the ~ symbol, which means that the expressions are parsed by the Julia compiler. Once parsed by the compiler, they are stored as the rhs and lhs fields of a composite type called Formula.

The details of the implementation, which is relatively simple, can be seen in the DataFrames.jl source code here: https://github.com/JuliaStats/DataFrames.jl/blob/725a22602b8b3f6413e35ebdd707b69c4ed7b659/src/statsmodels/formula.jl

Upvotes: 2

Related Questions