Bogumił Kamiński
Bogumił Kamiński

Reputation: 69949

Find a subset of columns of a data frame that have some missing values

Given the following data frame from DataFrames.jl:

julia> using DataFrames

julia> df = DataFrame(x1=[1, 2, 3], x2=Union{Int,Missing}[1, 2, 3], x3=[1, 2, missing])
3×3 DataFrame
 Row │ x1     x2      x3
     │ Int64  Int64?  Int64?
─────┼────────────────────────
   1 │     1       1        1
   2 │     2       2        2
   3 │     3       3  missing

I would like to find columns that contain missing value in them.

I have tried:

julia> names(df, Missing)
String[]

but this is incorrect as the names function, when passed a type, looks for subtypes of the passed type.

Upvotes: 4

Views: 98

Answers (1)

Bogumił Kamiński
Bogumił Kamiński

Reputation: 69949

If you want to find columns that actually contain missing value use:

julia> names(df, any.(ismissing, eachcol(df)))
1-element Vector{String}:
 "x3"

In this approach we iterate each column of the df data frame and check if it contains at least one missing value.

If you want to find columns that potentially can contain missing value you need to check their element type:

julia> names(df, [eltype(col) >: Missing for col in eachcol(df)]) # using a comprehension
2-element Vector{String}:
 "x2"
 "x3"

julia> names(df, .>:(eltype.(eachcol(df)), Missing)) # using broadcasting
2-element Vector{String}:
 "x2"
 "x3"

Upvotes: 5

Related Questions