Reputation: 1529
I want to replace missing values in a pipe. I'm aware of how to do it outside a pipe (see this post for more on that).
julia> df = DataFrame(:b => [2, 3], :a => [missing, "treatment"])
2×2 DataFrame
│ Row │ b │ a │
│ │ Int64 │ String? │
├─────┼───────┼───────────┤
│ 1 │ 2 │ missing │
│ 2 │ 3 │ treatment │
julia> df.a = replace(df.a, missing => "control") # works like expected
julia> @pipe df |>
replace(_.a, missing => "control") |>
select(_, :a, :b) # error because replace doesn't return a DataFrame.
I tried writing a transformation function but ismissing()
doesn't work in this context. I assume this is because it get's handed a column an not the separate values. But ismissing.()
doesn't work either. Does somebody have an idea how I can make this transformation work?
julia> ismissing(df.a[1])
true
julia> @pipe df |>
DataFrames.transform(_, :a => x -> ismissing(x) ? "control" : x)
2×3 DataFrame
│ Row │ b │ a │ a_function │
│ │ Int64 │ String? │ String? │
├─────┼───────┼───────────┼────────────┤
│ 1 │ 2 │ missing │ missing │
│ 2 │ 3 │ treatment │ treatment │
julia> @pipe df |>
DataFrames.transform(_, :a => x -> ismissing(x) ? x : "control" )
2×3 DataFrame
│ Row │ b │ a │ a_function │
│ │ Int64 │ String? │ String │
├─────┼───────┼───────────┼────────────┤
│ 1 │ 2 │ missing │ control │
│ 2 │ 3 │ treatment │ control │
julia> @pipe df |>
DataFrames.transform(_, :a => (x -> ismissing.(x) ? "control" : x) )
ERROR: TypeError: non-boolean (BitArray{1}) used in boolean context
P.s.
I know I can use the @transform
macro, but I don't find it very elegant. I think this replacement should be possible in one line of code.
julia> @pipe df |>
@transform(_, :a, x = replace(:a, missing => "control")) |>
select(_, Not(:a)) |>
rename(_, :x => :a)
2×2 DataFrame
│ Row │ b │ a │
│ │ Int64 │ String │
├─────┼───────┼───────────┤
│ 1 │ 2 │ control │
│ 2 │ 3 │ treatment │
Upvotes: 1
Views: 186
Reputation: 69819
Here are four sample ways to do it.
julia> @pipe df |>
select(_, :a => ByRow(x -> coalesce(x, "control")) => :a, :b)
2×2 DataFrame
Row │ a b
│ String Int64
─────┼──────────────────
1 │ control 2
2 │ treatment 3
julia> @pipe df |>
select(_, :a => (x -> coalesce.(x, "control")) => :a, :b)
2×2 DataFrame
Row │ a b
│ String Int64
─────┼──────────────────
1 │ control 2
2 │ treatment 3
julia> @pipe df |>
select(_, :a => (x -> replace(x, missing => "control")) => :a, :b)
2×2 DataFrame
Row │ a b
│ String Int64
─────┼──────────────────
1 │ control 2
2 │ treatment 3
julia> @pipe df |>
select(_, :a => ByRow(x -> ismissing(x) ? "control" : x) => :a, :b)
2×2 DataFrame
Row │ a b
│ String Int64
─────┼──────────────────
1 │ control 2
2 │ treatment 3
Upvotes: 1