Antonello
Antonello

Reputation: 6431

How to query (filter) a Julia DataFrame when some keys have NA values?

How to query (filter) a Julia DataFrame when some filters may be explicitly ask for a NA value ?

E.g.

using DataFrames

df = DataFrame(
       a = ["aa", "bb","zz", "cc","cc"],
       b = [1, 2, 2, 3, 0],
       v = [12,13,14,15,16]
)

df[3,1]  = NA
df[5,2]  = NA

5×3 DataFrames.DataFrame
│ Row │ a    │ b  │ v  │
├─────┼──────┼────┼────┤
│ 1   │ "aa" │ 1  │ 12 │
│ 2   │ "bb" │ 2  │ 13 │
│ 3   │ NA   │ 2  │ 14 │
│ 4   │ "cc" │ 3  │ 15 │
│ 5   │ "cc" │ NA │ 16 │

Neither of these two approach work:

# From DataFramesMeta.jl..
test = @where(:b .== 2, :a .== NA)

# From Query.jl..
x = @from i in df begin
    @where (i.a == "cc") && (i.b == NA)
    @select {i.a,i.b,i.v}
    @collect DataFrame
end

The problem is that I'll have to put this into a function where I don't know a priori if the parameters requested to filter are actual values or NAs, so to use isna() I would need to biforcate any single filter parameter..

Upvotes: 2

Views: 397

Answers (1)

Antonello
Antonello

Reputation: 6431

I learned that both (NA == NA) and (NA == everythingelse) both returns NA values instead of the expected boolean ones.

To get the expected behaviour (and use the comparison to query dataframes where NA values are either present or requested for) one can use isequal(a,b), e.g.:

test = @where(df, isequal.(:a,"cc"), isequal.(:b,NA) ) #from DataFramesMeta.jl

Upvotes: 1

Related Questions