Jonas
Jonas

Reputation: 1529

How to replacing missing values in a julia Pipe

I want to replace missing values in a pipe. I'm aware of how to do it outside a pipe (see this post for more on that).

julia> df = DataFrame(:b => [2, 3], :a => [missing, "treatment"])
2×2 DataFrame
│ Row │ b     │ a         │
│     │ Int64 │ String?   │
├─────┼───────┼───────────┤
│ 1   │ 2     │ missing   │
│ 2   │ 3     │ treatment │

julia> df.a = replace(df.a, missing => "control")  # works like expected


julia> @pipe df |>
             replace(_.a, missing => "control") |>
             select(_, :a, :b) # error because replace doesn't return a DataFrame.

I tried writing a transformation function but ismissing() doesn't work in this context. I assume this is because it get's handed a column an not the separate values. But ismissing.() doesn't work either. Does somebody have an idea how I can make this transformation work?

julia> ismissing(df.a[1])
true

julia> @pipe df |>
             DataFrames.transform(_, :a => x -> ismissing(x) ? "control" : x)
2×3 DataFrame
│ Row │ b     │ a         │ a_function │
│     │ Int64 │ String?   │ String?    │
├─────┼───────┼───────────┼────────────┤
│ 1   │ 2     │ missing   │ missing    │
│ 2   │ 3     │ treatment │ treatment  │

julia> @pipe df |>
             DataFrames.transform(_, :a => x -> ismissing(x) ? x : "control" )
2×3 DataFrame
│ Row │ b     │ a         │ a_function │
│     │ Int64 │ String?   │ String     │
├─────┼───────┼───────────┼────────────┤
│ 1   │ 2     │ missing   │ control    │
│ 2   │ 3     │ treatment │ control    │

julia> @pipe df |>
             DataFrames.transform(_, :a => (x -> ismissing.(x) ? "control" : x) )
ERROR: TypeError: non-boolean (BitArray{1}) used in boolean context

P.s. I know I can use the @transform macro, but I don't find it very elegant. I think this replacement should be possible in one line of code.


julia> @pipe df |>
             @transform(_, :a, x = replace(:a, missing => "control")) |>
             select(_, Not(:a)) |>
             rename(_, :x => :a)
2×2 DataFrame
│ Row │ b     │ a         │
│     │ Int64 │ String    │
├─────┼───────┼───────────┤
│ 1   │ 2     │ control   │
│ 2   │ 3     │ treatment │

Upvotes: 1

Views: 186

Answers (1)

Bogumił Kamiński
Bogumił Kamiński

Reputation: 69819

Here are four sample ways to do it.

julia> @pipe df |>
             select(_, :a => ByRow(x -> coalesce(x, "control")) => :a, :b)
2×2 DataFrame
 Row │ a          b
     │ String     Int64
─────┼──────────────────
   1 │ control        2
   2 │ treatment      3

julia> @pipe df |>
             select(_, :a => (x -> coalesce.(x, "control")) => :a, :b)
2×2 DataFrame
 Row │ a          b
     │ String     Int64
─────┼──────────────────
   1 │ control        2
   2 │ treatment      3

julia> @pipe df |>
             select(_, :a => (x -> replace(x, missing => "control")) => :a, :b)
2×2 DataFrame
 Row │ a          b
     │ String     Int64
─────┼──────────────────
   1 │ control        2
   2 │ treatment      3

julia> @pipe df |>
             select(_, :a => ByRow(x -> ismissing(x) ? "control" : x) => :a, :b)
2×2 DataFrame
 Row │ a          b
     │ String     Int64
─────┼──────────────────
   1 │ control        2
   2 │ treatment      3

Upvotes: 1

Related Questions