Reputation: 6431
How to query (filter) a Julia DataFrame when some filters may be explicitly ask for a NA value ?
E.g.
using DataFrames
df = DataFrame(
a = ["aa", "bb","zz", "cc","cc"],
b = [1, 2, 2, 3, 0],
v = [12,13,14,15,16]
)
df[3,1] = NA
df[5,2] = NA
5×3 DataFrames.DataFrame
│ Row │ a │ b │ v │
├─────┼──────┼────┼────┤
│ 1 │ "aa" │ 1 │ 12 │
│ 2 │ "bb" │ 2 │ 13 │
│ 3 │ NA │ 2 │ 14 │
│ 4 │ "cc" │ 3 │ 15 │
│ 5 │ "cc" │ NA │ 16 │
Neither of these two approach work:
# From DataFramesMeta.jl..
test = @where(:b .== 2, :a .== NA)
# From Query.jl..
x = @from i in df begin
@where (i.a == "cc") && (i.b == NA)
@select {i.a,i.b,i.v}
@collect DataFrame
end
The problem is that I'll have to put this into a function where I don't know a priori if the parameters requested to filter are actual values or NAs, so to use isna() I would need to biforcate any single filter parameter..
Upvotes: 2
Views: 397
Reputation: 6431
I learned that both (NA == NA)
and (NA == everythingelse)
both returns NA
values instead of the expected boolean ones.
To get the expected behaviour (and use the comparison to query dataframes where NA
values are either present or requested for) one can use isequal(a,b)
, e.g.:
test = @where(df, isequal.(:a,"cc"), isequal.(:b,NA) ) #from DataFramesMeta.jl
Upvotes: 1