Manuel
Manuel

Reputation: 2542

Indexing Dataframe with variable in Julia

I want to create a indexed subset of a DataFrame and use a variable inside it. In this case i want to change all -9999 values of the first column to NA's. If I do: df[df[:1] .== -9999, :1] = NA it works like it should.. But if i use a variable as the indexer it througs an error (LoadError: KeyError: key :i not found):

i = 1
df[df[:i] .== -9999, :i] = NA  

Upvotes: 2

Views: 1869

Answers (1)

Gnimuc
Gnimuc

Reputation: 8566

:i is actually a symbol in julia:

julia> typeof(:i)
Symbol

you can define a variable binding to a symbol like this:

julia> i = Symbol(2)
Symbol("2")

then you can simply use df[df[i] .== 1, i] = 123:

julia> df
10×1 DataFrames.DataFrame
│ Row │ 2   │
├─────┼─────┤
│ 1   │ 123 │
│ 2   │ 2   │
│ 3   │ 3   │
│ 4   │ 4   │
│ 5   │ 5   │
│ 6   │ 6   │
│ 7   │ 7   │
│ 8   │ 8   │
│ 9   │ 9   │
│ 10  │ 10  │

It's worth noting that in your example df[df[:1] .== -9999, :1], :1 is NOT a symbol:

julia> :1
1

In fact, the expression is equal to df[df[1] .== -9999, 1] which works in that there is a corresponding getindex method whose argument (col_ind) can accept a common index:

julia> @which df[df[1].==1, 1]
getindex{T<:Real}(df::DataFrames.DataFrame, row_inds::AbstractArray{T,1}, col_ind::Union{Real,Symbol})

Since you just want to change the first (n) column, there is no difference between Symbol("1") and 1 as long as your column names are regularly arranged as:

│ Row │ 1   │ 2   │ 3   │...
├─────┼─────┤─────┼─────┤
│ 1   │     │     │     │...

Upvotes: 5

Related Questions