tazzz
tazzz

Reputation: 43

Initializing a column with missing values and filling in fields later

How can one initialize a column in a DataFrame with missing values and then fill some elements of that column in later with Float values?

julia> df = DataFrame(:a => rand(4), :b => rand(4))
4×2 DataFrame
 Row │ a         b        
     │ Float64   Float64  
─────┼────────────────────
   1 │ 0.840074  0.673613
   2 │ 0.98867   0.33807
   3 │ 0.433315  0.150228
   4 │ 0.495254  0.833268

julia> insertcols!(df, :c => missing)
4×3 DataFrame
 Row │ a         b         c       
     │ Float64   Float64   Missing 
─────┼─────────────────────────────
   1 │ 0.840074  0.673613  missing 
   2 │ 0.98867   0.33807   missing 
   3 │ 0.433315  0.150228  missing 
   4 │ 0.495254  0.833268  missing 

julia> for row in eachrow(df)
           if rand() > 0.5 #based on processing of the row
               row[:c] = 1.0
           end
       end
ERROR: MethodError: convert(::Type{Union{}}, ::Float64) is ambiguous.

Upvotes: 2

Views: 59

Answers (3)

Jake Ireland
Jake Ireland

Reputation: 652

I had this problem the other day, so I wrote a more general function to do this with any types (more than just missing)

function add_type!(df::DataFrame, colname::Symbol, appendtypes::Type...)
    df[!, colname] =
        Vector{Union{appendtypes..., Base.uniontypes(eltype(df[!, colname]))...}}(df[!, colname])
    return df
end

You can of course make a dedicated function for missing:

add_type!(df::DataFrame, colname::Symbol) = add_type!(df, colname, Missing)

Upvotes: 0

Bogumił Kamiński
Bogumił Kamiński

Reputation: 69949

This is the way I normally do it:

julia> using DataFrames

julia> df = DataFrame(:a => rand(4), :b => rand(4))
4×2 DataFrame
 Row │ a         b
     │ Float64   Float64
─────┼────────────────────
   1 │ 0.388546  0.522189
   2 │ 0.232263  0.102722
   3 │ 0.519866  0.578753
   4 │ 0.493797  0.146636

julia> df.c = missings(Float64, nrow(df))
4-element Vector{Union{Missing, Float64}}:
 missing
 missing
 missing
 missing

julia> df
4×3 DataFrame
 Row │ a         b         c
     │ Float64   Float64   Float64?
─────┼──────────────────────────────
   1 │ 0.388546  0.522189   missing
   2 │ 0.232263  0.102722   missing
   3 │ 0.519866  0.578753   missing
   4 │ 0.493797  0.146636   missing

see also https://bkamins.github.io/julialang/2021/09/03/missing.html for more examples of working with missing values.

Upvotes: 2

tazzz
tazzz

Reputation: 43

One can do this the following way -

df.c = Vector{Union{Float64,Missing}}(missing, size(df, 1))

Upvotes: 2

Related Questions