Julia: Create DataFrame column from expression?

Question

Given this:

dict = Dict(("y" => ":x / 2"))

df = DataFrame(x = [1, 2, 3, 4])

df
4×1 DataFrame
│ Row │ x     │
│     │ Int64 │
├─────┼───────┤
│ 1   │ 1     │
│ 2   │ 2     │
│ 3   │ 3     │
│ 4   │ 4     │

I want to make this:

4×2 DataFrame
│ Row │ x     │ y       │
│     │ Int64 │ Float64 │
├─────┼───────┼─────────┤
│ 1   │ 1     │ 0.5     │
│ 2   │ 2     │ 1.0     │
│ 3   │ 3     │ 1.5     │
│ 4   │ 4     │ 2.0     │

This seems like a perfect application for DataFramesMeta, either @with or @eachrow, but I haven't been able to get my expression to evaluate as expected in an environment where :x exists.

Basically, I want to be able to iterate over (k, v) pairs in dict and create one new column for each Symbol(k) with corresponding values eval(Meta.parse(v)), or something along those lines, where the evaluation occurs such that Symbols like :x exist at the time of evaluation.

I didn't expect this to work, and it doesn't:

[df[Symbol(k)] = eval(Meta.parse(v)) for (k, v) in dict]

ERROR: MethodError: no method matching /(::Symbol, ::Int64)

But this illustrates the problem: I need the expressions to be evaluated in an environment where the symbols they contain exist.

However, moving it inside a @with doesn't work:

using DataFramesMeta

@with(df, [eval(Meta.parse(v)) for (k, v) in dict])

ERROR: MethodError: no method matching /(::Symbol, ::Int64)

Using @eachrow fails the same way:

using DataFramesMeta

@eachrow df begin
           for (k, v) in dict
               @newcol tmp::Vector{Float32}
               tmp = eval(Meta.parse(v))
           end
       end

ERROR: MethodError: no method matching /(::Symbol, ::Int64)

I'm guessing I'm unclear on some key element of how DataFramesMeta creates an environment within a DataFrame. I also don't necessarily have to use DataFramesMeta for this, any reasonably concise option will work since I can encapsulate it in a package function.

Note: I control the format of the strings to be parsed into expressions, but I want to avoid complexity such as specifying the name of the DataFrame object in the string, or broadcasting every operation. I want the expression syntax in the initial string to be reasonably clear to non-Julia programmers.

UPDATE: I tried all three solutions in the comments on this question, and they have a problem: they don't work inside functions.

dict = Dict(("y" => ":x / 2"))

data = DataFrame(x = [1, 2, 3, 4])


function transform_from_dict(df, dict)

    new = eval(Meta.parse("@transform(df, " * join(join.(collect(dict), " = "), ", ") * ")"))

    return new

end

transform_from_dict(data, dict)

ERROR: UndefVarError: df not defined

Or:

function transform_from_dict!(df, dict)

    [df[!, Symbol(k)] = eval(:(@with(df, $(Meta.parse(v))))) for (k, v) in dict]

    return nothing

end

transform_from_dict!(data, dict)

ERROR: UndefVarError: df not defined

questionto42 · Accepted Answer

I have worked on this answer in parallel to @Ajar, nothing is copied from that answer nor did I know about it. I was totally new to Julia so I had to install it (because I thought the online compilers did not even know a DataFrame), later I understood that these packages must be called at start anyway, be it online or offline. I have added the package information that beginners might need to know.

using Pkg 
Pkg.add("DataFrames")
Pkg.add("DataFramesMeta")

using DataFrames
using DataFramesMeta 
dict = Dict(("y" => ":x / 2"))
df = DataFrame(x = [1, 2, 3, 4])

The @with solution:

julia> function transform_from_dict!(k, v)
           global df
           df[!, Symbol(k)] = eval(:(@with(df, $(Meta.parse(v)))))
           return nothing
       end

transform_from_dict! (generic function with 2 methods)

julia> [transform_from_dict!(k, v) for (k, v) in dict]

1-element Array{Nothing,1}:
 nothing

julia> df

4×2 DataFrame
 Row │ x      y
     │ Int64  Float64
─────┼────────────────
   1 │     1      0.5
   2 │     2      1.0
   3 │     3      1.5
   4 │     4      2.0

The @transform solution:

julia> function transform_from_dict(df, dict)
           global new
           new = eval(Meta.parse("@transform(df, " * join(join.(collect(dict), " = "), ", ") * ")"))

           return new

       end

transform_from_dict (generic function with 1 method)

julia>

julia> transform_from_dict(data, dict)
4×2 DataFrame
 Row │ x      y
     │ Int64  Float64
─────┼────────────────
   1 │     1      0.5
   2 │     2      1.0
   3 │     3      1.5
   4 │     4      2.0

Thanks go to the other commentators, the essential ideas listed in @Ajar's answer.

Julia: Create DataFrame column from expression?

Answers (2)

Related Questions