Reputation: 1826
Given this:
dict = Dict(("y" => ":x / 2"))
df = DataFrame(x = [1, 2, 3, 4])
df
4×1 DataFrame
│ Row │ x │
│ │ Int64 │
├─────┼───────┤
│ 1 │ 1 │
│ 2 │ 2 │
│ 3 │ 3 │
│ 4 │ 4 │
I want to make this:
4×2 DataFrame
│ Row │ x │ y │
│ │ Int64 │ Float64 │
├─────┼───────┼─────────┤
│ 1 │ 1 │ 0.5 │
│ 2 │ 2 │ 1.0 │
│ 3 │ 3 │ 1.5 │
│ 4 │ 4 │ 2.0 │
This seems like a perfect application for DataFramesMeta
, either @with
or @eachrow
, but I haven't been able to get my expression to evaluate as expected in an environment where :x
exists.
Basically, I want to be able to iterate over (k, v)
pairs in dict
and create one new column for each Symbol(k)
with corresponding values eval(Meta.parse(v))
, or something along those lines, where the evaluation occurs such that Symbols
like :x
exist at the time of evaluation.
I didn't expect this to work, and it doesn't:
[df[Symbol(k)] = eval(Meta.parse(v)) for (k, v) in dict]
ERROR: MethodError: no method matching /(::Symbol, ::Int64)
But this illustrates the problem: I need the expressions to be evaluated in an environment where the symbols they contain exist.
However, moving it inside a @with
doesn't work:
using DataFramesMeta
@with(df, [eval(Meta.parse(v)) for (k, v) in dict])
ERROR: MethodError: no method matching /(::Symbol, ::Int64)
Using @eachrow
fails the same way:
using DataFramesMeta
@eachrow df begin
for (k, v) in dict
@newcol tmp::Vector{Float32}
tmp = eval(Meta.parse(v))
end
end
ERROR: MethodError: no method matching /(::Symbol, ::Int64)
I'm guessing I'm unclear on some key element of how DataFramesMeta
creates an environment within a DataFrame. I also don't necessarily have to use DataFramesMeta
for this, any reasonably concise option will work since I can encapsulate it in a package function.
Note: I control the format of the strings to be parsed into expressions, but I want to avoid complexity such as specifying the name of the DataFrame object in the string, or broadcasting every operation. I want the expression syntax in the initial string to be reasonably clear to non-Julia programmers.
UPDATE: I tried all three solutions in the comments on this question, and they have a problem: they don't work inside functions.
dict = Dict(("y" => ":x / 2"))
data = DataFrame(x = [1, 2, 3, 4])
function transform_from_dict(df, dict)
new = eval(Meta.parse("@transform(df, " * join(join.(collect(dict), " = "), ", ") * ")"))
return new
end
transform_from_dict(data, dict)
ERROR: UndefVarError: df not defined
Or:
function transform_from_dict!(df, dict)
[df[!, Symbol(k)] = eval(:(@with(df, $(Meta.parse(v))))) for (k, v) in dict]
return nothing
end
transform_from_dict!(data, dict)
ERROR: UndefVarError: df not defined
Upvotes: 3
Views: 327
Reputation: 9512
I have worked on this answer in parallel to @Ajar, nothing is copied from that answer nor did I know about it. I was totally new to Julia so I had to install it (because I thought the online compilers did not even know a DataFrame), later I understood that these packages must be called at start anyway, be it online or offline. I have added the package information that beginners might need to know.
using Pkg
Pkg.add("DataFrames")
Pkg.add("DataFramesMeta")
using DataFrames
using DataFramesMeta
dict = Dict(("y" => ":x / 2"))
df = DataFrame(x = [1, 2, 3, 4])
The @with solution:
julia> function transform_from_dict!(k, v)
global df
df[!, Symbol(k)] = eval(:(@with(df, $(Meta.parse(v)))))
return nothing
end
transform_from_dict! (generic function with 2 methods)
julia> [transform_from_dict!(k, v) for (k, v) in dict]
1-element Array{Nothing,1}: nothing
julia> df
4×2 DataFrame Row │ x y │ Int64 Float64 ─────┼──────────────── 1 │ 1 0.5 2 │ 2 1.0 3 │ 3 1.5 4 │ 4 2.0
The @transform solution:
julia> function transform_from_dict(df, dict)
global new
new = eval(Meta.parse("@transform(df, " * join(join.(collect(dict), " = "), ", ") * ")"))
return new
end
transform_from_dict (generic function with 1 method)
julia>
julia> transform_from_dict(data, dict)
4×2 DataFrame
Row │ x y
│ Int64 Float64
─────┼────────────────
1 │ 1 0.5
2 │ 2 1.0
3 │ 3 1.5
4 │ 4 2.0
Thanks go to the other commentators, the essential ideas listed in @Ajar's answer.
Upvotes: 1
Reputation: 1826
OK, combining answers from all of the commenters works!
using DataFrames
using DataFramesMeta
dict = Dict(("y" => ":x / 2"))
data = DataFrame(x = [1, 2, 3, 4])
@张实唯's approach using @with
:
# using @with
function transform_from_dict1(df, dict)
global df
[df[!, Symbol(k)] = eval(:(@with(df, $(Meta.parse(v))))) for (k, v) in dict]
return df
end
transform_from_dict1(data, dict)
# 4×2 DataFrame
# │ Row │ x │ y │
# │ │ Int64 │ Float64 │
# ├─────┼───────┼─────────┤
# │ 1 │ 1 │ 0.5 │
# │ 2 │ 2 │ 1.0 │
# │ 3 │ 3 │ 1.5 │
# │ 4 │ 4 │ 2.0 │
And @Bogumił Kamiński's approach using @transform
:
# using @transform
function transform_from_dict2(df, dict)
global df
new_df = eval(Meta.parse("@transform(df, " * join(join.(collect(dict), " = "), ", ") * ")"))
return new_df
end
transform_from_dict2(data, dict)
# 4×2 DataFrame
# │ Row │ x │ y │
# │ │ Int64 │ Float64 │
# ├─────┼───────┼─────────┤
# │ 1 │ 1 │ 0.5 │
# │ 2 │ 2 │ 1.0 │
# │ 3 │ 3 │ 1.5 │
# │ 4 │ 4 │ 2.0 │
Both incorporate the fix from @Lorenz using global
.
Note that the second form uses about 2.5x more memory than the first, likely due to the creation of a second DataFrame
:
julia> @allocated transform_from_dict1(data, dict)
853948
julia> @allocated transform_from_dict2(data, dict)
22009111
I also think the first form is a little more clear, so that's what I'm using internally.
Note that you may need to broadcast logical operators if you have those in your transforms, and that as usual you'll need to handle any missing data issues up front.
Upvotes: 1