Reputation: 3828
When I try to use the dot operator (element wise operation) in a DataFrame where a function returning a tuple is applied, I get the following error.
Here is a toy example,
df = DataFrame()
df[:, :x] = rand(5)
df[:, :y] = rand(5)
#Function that returns two values in the form of a tuple
add_minus_two(x,y) = (x-y,x+y)
df[:,"x+y"] = add_minus_two.(df[:,:x], df[:,:y])[2]
#Out > ERROR: MethodError: no method matching setindex!(::DataFrame, ::Tuple{Float64, Float64}, ::Colon, ::String)
#However removing the dot operator works fine
df[:,"x+y"] = add_minus_two(df[:,:x], df[:,:y])[2]
#Out > 5 x 3 DataFrame
#Furthermore if its just one argument either dot or not, works fine as well
add_two(x,y) = x+y
df[:, "x+y"] = add_two(df[:,:x], df[:,:y])
df[:, "x+y"] = add_two.(df[:,:x], df[:,:y])
#out > 5 x 3 DataFrame
Any reason why this is. I thought for elementwise operation you need to use "dot" operator.
Also for my actual problem (when a function return 2 values in a tuple), when NOT using the dot operator gives,
ERROR: MethodError: no method matching compute_T(::Vector{Float64}, ::Vector{Float64})
and using the dot operator gives,
ERROR: MethodError: no method matching setindex!(::DataFrame, ::Tuple{Float64, Float64}, ::Colon, ::String)
and returning a single argument, similar to the toy example works fine as well.
Any clue what I am doing incorrectly here ?
Upvotes: 2
Views: 1208
Reputation: 69819
This is not a DataFrames.jl issue, but how Julia Base works.
I concentrate only on RHS, as LHS is irrelevant (and RHS is unrelated to DataFrames.jl).
First, how to write what you want. Initialization:
julia> using DataFrames
julia> df = DataFrame()
0×0 DataFrame
julia> df[:, :x] = rand(5)
5-element Vector{Float64}:
0.6146045473316457
0.6319531776216596
0.599267794937812
0.40864382019544965
0.3738682778395166
julia> df[:, :y] = rand(5)
5-element Vector{Float64}:
0.07891853567296825
0.2143545316544586
0.5943274462916335
0.2182702556068421
0.5810132720450707
julia> add_minus_two(x,y) = (x-y,x+y)
add_minus_two (generic function with 1 method)
And now you get:
julia> add_minus_two(df[:,:x], df[:,:y])
([0.5356860116586775, 0.417598645967201, 0.004940348646178538, 0.19037356458860755, -0.2071449942055541], [0.693523083004614, 0.8463077092761182, 1.1935952412294455, 0.6269140758022917, 0.9548815498845873])
julia> add_minus_two.(df[:,:x], df[:,:y])
5-element Vector{Tuple{Float64, Float64}}:
(0.5356860116586775, 0.693523083004614)
(0.417598645967201, 0.8463077092761182)
(0.004940348646178538, 1.1935952412294455)
(0.19037356458860755, 0.6269140758022917)
(-0.2071449942055541, 0.9548815498845873)
julia> add_minus_two(df[:,:x], df[:,:y])[2]
5-element Vector{Float64}:
0.693523083004614
0.8463077092761182
1.1935952412294455
0.6269140758022917
0.9548815498845873
julia> add_minus_two.(df[:,:x], df[:,:y])[2]
(0.417598645967201, 0.8463077092761182)
julia> getindex.(add_minus_two.(df[:,:x], df[:,:y]), 2) # this is probably what you want
5-element Vector{Float64}:
0.693523083004614
0.8463077092761182
1.1935952412294455
0.6269140758022917
0.9548815498845873
Now the point is that when you write:
df[:,"x+y"] = whatever_you_pass
The whatever_you_pass
part must be an AbstractVector
with an appropriate number of columns. This means that what will work is:
add_minus_two.(df[:,:x], df[:,:y])
add_minus_two(df[:,:x], df[:,:y])[2]
getindex.(add_minus_two.(df[:,:x], df[:,:y]), 2)
and what will fail is (as in these cases a Tuple
not AbstractVector
is produced)
add_minus_two(df[:,:x], df[:,:y])
add_minus_two.(df[:,:x], df[:,:y])[2]
Out of the working syntaxes just pick the one you want.
The general recommendation is that when you do assignment always inspect the RHS stand alone and analyze if it has a proper structure.
Also, notably, this will work:
julia> transform(df, [:x, :y] => ByRow(add_minus_two) => ["x-y", "x+y"])
5×4 DataFrame
Row │ x y x-y x+y
│ Float64 Float64 Float64 Float64
─────┼────────────────────────────────────────────
1 │ 0.614605 0.0789185 0.535686 0.693523
2 │ 0.631953 0.214355 0.417599 0.846308
3 │ 0.599268 0.594327 0.00494035 1.1936
4 │ 0.408644 0.21827 0.190374 0.626914
5 │ 0.373868 0.581013 -0.207145 0.954882
(you have not asked about it but maybe this is what you actually are looking for - and as opposed to setindex!
this syntax is DataFrames.jl specific)
Upvotes: 1