How to handle missing in boolean context in Julia?

Question

I'm trying to create a categorical variable based on ranges of values from other (numerical) column. However, the code don't work when I have missings in the numerical column

Here is a replicable example:

using RDatasets;
using DataFrames;
using Pipe;
using FreqTables;

df = dataset("datasets","iris")
#lowercase columns just for convenience
@pipe df |> rename!(_, [lowercase(k) for k in names(df)]);

#without this line, the code works fine
@pipe df |> allowmissing!(_, :sepallength) |> replace!(_.sepallength, 4.9 => missing);

df[:size] = @. ifelse(df[:sepallength]<=4.7, "small", missing)
df[:size] = @. ifelse((df[:sepallength]>4.7) & (df[:sepallength]<=4.9), "avg", df[:size])
df[:size] = @. ifelse((df[:sepallength]>4.9) & (df[:sepallength]<=5), "large", df[:size])
df[:size] = @. ifelse(df[:sepallength]>5, "huge", df[:size])

println(@pipe df |> freqtable(_, :size))

Output:

TypeError: non-boolean (Missing) used in boolean context

I would like to ignore the missing cases in the numerical variable but I cannot just drop de missings because this will drop other important informations in my dataset. Moreover, if I drop just the missings in sepallength the column df[:size] would have a different length than the original dataframe.

Bogumił Kamiński · Accepted Answer

Use the coalesce function like this:

julia> x = [1,2,3,missing,5,6,7]
7-element Array{Union{Missing, Int64},1}:
 1
 2
 3
  missing
 5
 6
 7

julia> @. ifelse(coalesce(x < 4.7, false), "small", missing)
7-element Array{Union{Missing, String},1}:
 "small"
 "small"
 "small"
 missing
 missing
 missing
 missing

As a side note do not write df[:size] (this syntax has been deprecated for over 2 years now and soon it will error) but rather df.size or df."size" to access the column of the data frame (the df."size" is for cases when your column names contain characters like spaces etc., e.g. df."my fancy column!").

How to handle missing in boolean context in Julia?

Answers (2)

Related Questions