imantha
imantha

Reputation: 3828

DataFrames : no method matching setindex!(::DataFrame, ::Tuple{Float64, Float64}, ::Colon, ::String)

When I try to use the dot operator (element wise operation) in a DataFrame where a function returning a tuple is applied, I get the following error.

Here is a toy example,

df = DataFrame()

df[:, :x] = rand(5)
df[:, :y] = rand(5)

#Function that returns two values in the form of a tuple
add_minus_two(x,y) = (x-y,x+y)

df[:,"x+y"] = add_minus_two.(df[:,:x], df[:,:y])[2]
#Out > ERROR: MethodError: no method matching setindex!(::DataFrame, ::Tuple{Float64, Float64}, ::Colon, ::String)

#However removing the dot operator works fine
df[:,"x+y"] = add_minus_two(df[:,:x], df[:,:y])[2]
#Out > 5 x 3 DataFrame

#Furthermore if its just one argument either dot or not, works fine as well
add_two(x,y) = x+y
df[:, "x+y"] = add_two(df[:,:x], df[:,:y])
df[:, "x+y"] = add_two.(df[:,:x], df[:,:y])
#out > 5 x 3 DataFrame

Any reason why this is. I thought for elementwise operation you need to use "dot" operator.

Also for my actual problem (when a function return 2 values in a tuple), when NOT using the dot operator gives,

 ERROR: MethodError: no method matching compute_T(::Vector{Float64}, ::Vector{Float64})

and using the dot operator gives,

ERROR: MethodError: no method matching setindex!(::DataFrame, ::Tuple{Float64, Float64}, ::Colon, ::String)  

and returning a single argument, similar to the toy example works fine as well.

Any clue what I am doing incorrectly here ?

Upvotes: 2

Views: 1208

Answers (1)

Bogumił Kamiński
Bogumił Kamiński

Reputation: 69819

This is not a DataFrames.jl issue, but how Julia Base works.

I concentrate only on RHS, as LHS is irrelevant (and RHS is unrelated to DataFrames.jl).

First, how to write what you want. Initialization:

julia> using DataFrames

julia> df = DataFrame()
0×0 DataFrame

julia> df[:, :x] = rand(5)
5-element Vector{Float64}:
 0.6146045473316457
 0.6319531776216596
 0.599267794937812
 0.40864382019544965
 0.3738682778395166

julia> df[:, :y] = rand(5)
5-element Vector{Float64}:
 0.07891853567296825
 0.2143545316544586
 0.5943274462916335
 0.2182702556068421
 0.5810132720450707

julia> add_minus_two(x,y) = (x-y,x+y)
add_minus_two (generic function with 1 method)

And now you get:

julia> add_minus_two(df[:,:x], df[:,:y])
([0.5356860116586775, 0.417598645967201, 0.004940348646178538, 0.19037356458860755, -0.2071449942055541], [0.693523083004614, 0.8463077092761182, 1.1935952412294455, 0.6269140758022917, 0.9548815498845873])

julia> add_minus_two.(df[:,:x], df[:,:y])
5-element Vector{Tuple{Float64, Float64}}:
 (0.5356860116586775, 0.693523083004614)
 (0.417598645967201, 0.8463077092761182)
 (0.004940348646178538, 1.1935952412294455)
 (0.19037356458860755, 0.6269140758022917)
 (-0.2071449942055541, 0.9548815498845873)

julia> add_minus_two(df[:,:x], df[:,:y])[2]
5-element Vector{Float64}:
 0.693523083004614
 0.8463077092761182
 1.1935952412294455
 0.6269140758022917
 0.9548815498845873

julia> add_minus_two.(df[:,:x], df[:,:y])[2]
(0.417598645967201, 0.8463077092761182)

julia> getindex.(add_minus_two.(df[:,:x], df[:,:y]), 2) # this is probably what you want
5-element Vector{Float64}:
 0.693523083004614
 0.8463077092761182
 1.1935952412294455
 0.6269140758022917
 0.9548815498845873

Now the point is that when you write:

df[:,"x+y"] = whatever_you_pass

The whatever_you_pass part must be an AbstractVector with an appropriate number of columns. This means that what will work is:

  • add_minus_two.(df[:,:x], df[:,:y])
  • add_minus_two(df[:,:x], df[:,:y])[2]
  • getindex.(add_minus_two.(df[:,:x], df[:,:y]), 2)

and what will fail is (as in these cases a Tuple not AbstractVector is produced)

  • add_minus_two(df[:,:x], df[:,:y])
  • add_minus_two.(df[:,:x], df[:,:y])[2]

Out of the working syntaxes just pick the one you want.

The general recommendation is that when you do assignment always inspect the RHS stand alone and analyze if it has a proper structure.

Also, notably, this will work:

julia> transform(df, [:x, :y] => ByRow(add_minus_two) => ["x-y", "x+y"])
5×4 DataFrame
 Row │ x         y          x-y          x+y
     │ Float64   Float64    Float64      Float64
─────┼────────────────────────────────────────────
   1 │ 0.614605  0.0789185   0.535686    0.693523
   2 │ 0.631953  0.214355    0.417599    0.846308
   3 │ 0.599268  0.594327    0.00494035  1.1936
   4 │ 0.408644  0.21827     0.190374    0.626914
   5 │ 0.373868  0.581013   -0.207145    0.954882

(you have not asked about it but maybe this is what you actually are looking for - and as opposed to setindex! this syntax is DataFrames.jl specific)

Upvotes: 1

Related Questions