Michele Garau
Michele Garau

Reputation: 88

Replace missing values with values from another column in Julia Dataframe

I have a data frame where some columns have missing values. I would like that if missing values are found, an alternative from a second column is picked. For example, in:

df = DataFrame(x = [0, missing, 2], y=[2, 4, 6])

I would like missing to be substituted with 4.

At the moment I am solving the problem with this solution:

for row in eachrow(df)
    if ismissing(row[:x])
        row[:x] = row[:y]
    end
end

But I wonder if a better solution that avoids for-loops can be foundπŸ€”.

I tried with replace(A, old_new::Pair...; [count::Integer]), but it seems that the pair accepts only scalars, and also with broadcasting I was not able to have success.

Do you have any suggestions?

Upvotes: 6

Views: 1024

Answers (1)

BogumiΕ‚ KamiΕ„ski
BogumiΕ‚ KamiΕ„ski

Reputation: 69949

You can use coalesce:

julia> df = DataFrame(x = [0, missing, 2], y=[2, 4, 6])
3Γ—2 DataFrame
 Row β”‚ x        y
     β”‚ Int64?   Int64
─────┼────────────────
   1 β”‚       0      2
   2 β”‚ missing      4
   3 β”‚       2      6

julia> df.x .= coalesce.(df.x, df.y)
3-element Array{Union{Missing, Int64},1}:
 0
 4
 2

julia> df
3Γ—2 DataFrame
 Row β”‚ x       y
     β”‚ Int64?  Int64
─────┼───────────────
   1 β”‚      0      2
   2 β”‚      4      4
   3 β”‚      2      6

or if you like piping-aware functions:

julia> df = DataFrame(x = [0, missing, 2], y=[2, 4, 6])
3Γ—2 DataFrame
 Row β”‚ x        y
     β”‚ Int64?   Int64
─────┼────────────────
   1 β”‚       0      2
   2 β”‚ missing      4
   3 β”‚       2      6

julia> transform!(df, [:x, :y] => ByRow(coalesce) => :x)
3Γ—2 DataFrame
 Row β”‚ x      y
     β”‚ Int64  Int64
─────┼──────────────
   1 β”‚     0      2
   2 β”‚     4      4
   3 β”‚     2      6

and this is the same, but not requiring you to remember about coalesce:

julia> df = DataFrame(x = [0, missing, 2], y=[2, 4, 6])
3Γ—2 DataFrame
 Row β”‚ x        y
     β”‚ Int64?   Int64
─────┼────────────────
   1 β”‚       0      2
   2 β”‚ missing      4
   3 β”‚       2      6

julia> transform!(df, [:x, :y] => ByRow((x,y) -> ismissing(x) ? y : x) => :x)
3Γ—2 DataFrame
 Row β”‚ x      y
     β”‚ Int64  Int64
─────┼──────────────
   1 β”‚     0      2
   2 β”‚     4      4
   3 β”‚     2      6

Upvotes: 5

Related Questions