Reputation: 88
I have a data frame where some columns have missing values. I would like that if missing values are found, an alternative from a second column is picked. For example, in:
df = DataFrame(x = [0, missing, 2], y=[2, 4, 6])
I would like missing
to be substituted with 4.
At the moment I am solving the problem with this solution:
for row in eachrow(df)
if ismissing(row[:x])
row[:x] = row[:y]
end
end
But I wonder if a better solution that avoids for-loops can be foundπ€.
I tried with replace(A, old_new::Pair...; [count::Integer])
, but it seems that the pair accepts only scalars, and also with broadcasting I was not able to have success.
Do you have any suggestions?
Upvotes: 6
Views: 1024
Reputation: 69949
You can use coalesce
:
julia> df = DataFrame(x = [0, missing, 2], y=[2, 4, 6])
3Γ2 DataFrame
Row β x y
β Int64? Int64
ββββββΌββββββββββββββββ
1 β 0 2
2 β missing 4
3 β 2 6
julia> df.x .= coalesce.(df.x, df.y)
3-element Array{Union{Missing, Int64},1}:
0
4
2
julia> df
3Γ2 DataFrame
Row β x y
β Int64? Int64
ββββββΌβββββββββββββββ
1 β 0 2
2 β 4 4
3 β 2 6
or if you like piping-aware functions:
julia> df = DataFrame(x = [0, missing, 2], y=[2, 4, 6])
3Γ2 DataFrame
Row β x y
β Int64? Int64
ββββββΌββββββββββββββββ
1 β 0 2
2 β missing 4
3 β 2 6
julia> transform!(df, [:x, :y] => ByRow(coalesce) => :x)
3Γ2 DataFrame
Row β x y
β Int64 Int64
ββββββΌββββββββββββββ
1 β 0 2
2 β 4 4
3 β 2 6
and this is the same, but not requiring you to remember about coalesce
:
julia> df = DataFrame(x = [0, missing, 2], y=[2, 4, 6])
3Γ2 DataFrame
Row β x y
β Int64? Int64
ββββββΌββββββββββββββββ
1 β 0 2
2 β missing 4
3 β 2 6
julia> transform!(df, [:x, :y] => ByRow((x,y) -> ismissing(x) ? y : x) => :x)
3Γ2 DataFrame
Row β x y
β Int64 Int64
ββββββΌββββββββββββββ
1 β 0 2
2 β 4 4
3 β 2 6
Upvotes: 5