Reputation: 43
How can one initialize a column in a DataFrame with missing values and then fill some elements of that column in later with Float values?
julia> df = DataFrame(:a => rand(4), :b => rand(4))
4×2 DataFrame
Row │ a b
│ Float64 Float64
─────┼────────────────────
1 │ 0.840074 0.673613
2 │ 0.98867 0.33807
3 │ 0.433315 0.150228
4 │ 0.495254 0.833268
julia> insertcols!(df, :c => missing)
4×3 DataFrame
Row │ a b c
│ Float64 Float64 Missing
─────┼─────────────────────────────
1 │ 0.840074 0.673613 missing
2 │ 0.98867 0.33807 missing
3 │ 0.433315 0.150228 missing
4 │ 0.495254 0.833268 missing
julia> for row in eachrow(df)
if rand() > 0.5 #based on processing of the row
row[:c] = 1.0
end
end
ERROR: MethodError: convert(::Type{Union{}}, ::Float64) is ambiguous.
Upvotes: 2
Views: 59
Reputation: 652
I had this problem the other day, so I wrote a more general function to do this with any types (more than just missing)
function add_type!(df::DataFrame, colname::Symbol, appendtypes::Type...)
df[!, colname] =
Vector{Union{appendtypes..., Base.uniontypes(eltype(df[!, colname]))...}}(df[!, colname])
return df
end
You can of course make a dedicated function for missing:
add_type!(df::DataFrame, colname::Symbol) = add_type!(df, colname, Missing)
Upvotes: 0
Reputation: 69949
This is the way I normally do it:
julia> using DataFrames
julia> df = DataFrame(:a => rand(4), :b => rand(4))
4×2 DataFrame
Row │ a b
│ Float64 Float64
─────┼────────────────────
1 │ 0.388546 0.522189
2 │ 0.232263 0.102722
3 │ 0.519866 0.578753
4 │ 0.493797 0.146636
julia> df.c = missings(Float64, nrow(df))
4-element Vector{Union{Missing, Float64}}:
missing
missing
missing
missing
julia> df
4×3 DataFrame
Row │ a b c
│ Float64 Float64 Float64?
─────┼──────────────────────────────
1 │ 0.388546 0.522189 missing
2 │ 0.232263 0.102722 missing
3 │ 0.519866 0.578753 missing
4 │ 0.493797 0.146636 missing
see also https://bkamins.github.io/julialang/2021/09/03/missing.html for more examples of working with missing
values.
Upvotes: 2
Reputation: 43
One can do this the following way -
df.c = Vector{Union{Float64,Missing}}(missing, size(df, 1))
Upvotes: 2