René
René

Reputation: 4827

Select rows with missing value in a Julia dataframe

I'm just started exploring Julia and am struggeling with subsetting dataframes. I would like to select rows where LABEL has the value "B" and VALUE is missing. Selecting rows with "B" works fine, but trying to add a filter for missing fails. Any suggestions how to solve this. Tips for good documentation on subsetting/filtering dataframes in Julia are welcome. In the Julia documentation I haven't found a solution.

using DataFrames
df = DataFrame(ID = 1:5, LABEL = ["A", "A", "B", "B", "B"], VALUE = ["A1", "A2", "B1", "B2", missing])
df[df[:LABEL] .== "B", :] # works fine
df[df[:LABEL] .== "B" && df[:VALUE] .== missing, :] # fails

Upvotes: 4

Views: 1429

Answers (1)

Bogumił Kamiński
Bogumił Kamiński

Reputation: 69949

Use:

filter([:LABEL, :VALUE] => (l, v) -> l == "B" && ismissing(v), df)

(a very similar example is given in the documentation of the filter function).

If you want to use getindex then write:

df[(df.LABEL .== "B") .& ismissing.(df.VALUE), :]

The fact that you need to use .& instead of && when working with arrays is not DataFrames.jl specific - this is a common pattern in Julia in general when indexing arrays with booleans.

Upvotes: 5

Related Questions