Skeppet
Skeppet

Reputation: 981

Proper way to test for NA in Julia DataFrames

What is the proper way to test if a value in a DataFrame is NA in the Julia DataFrames package?

I have this far found out that typeof(var) == NAtype works, but is there a more elegant way of doing it?

Upvotes: 10

Views: 2326

Answers (2)

Shayan
Shayan

Reputation: 6295

In addition to @jub0bs answer, If one would like to check whether a DataFrame contains any NaN values or not, the following code can help:

julia> df = DataFrame(A = 1:10, B = 2:2:20)
10×2 DataFrame
 Row │ A      B
     │ Int64  Int64
─────┼──────────────
   1 │     1      2
   2 │     2      4
   3 │     3      6
   4 │     4      8
   5 │     5     10
   6 │     6     12
   7 │     7     14
   8 │     8     16
   9 │     9     18
  10 │    10     20


julia> any(isnan.(Matrix(df)))
false

This means there aren't any NaN values in the given DataFrame!

Upvotes: 0

jub0bs
jub0bs

Reputation: 66244

Using typeof(var) == NAtype for this is awkward, in particular because it is not vectorized.

The canonical way of testing for NA values is to use the (vectorized) function called isna.

Example

Let's generate a toy DataFrame with some NA values in the B column:

julia> using DataFrames

julia> df = DataFrame(A = 1:10, B = 2:2:20)
10x2 DataFrame
| Row | A  | B  |
|-----|----|----|
| 1   | 1  | 2  |
| 2   | 2  | 4  |
| 3   | 3  | 6  |
| 4   | 4  | 8  |
| 5   | 5  | 10 |
| 6   | 6  | 12 |
| 7   | 7  | 14 |
| 8   | 8  | 16 |
| 9   | 9  | 18 |
| 10  | 10 | 20 |

julia> df[[1,4,8],symbol("B")] = NA
NA

julia> df
10x2 DataFrame
| Row | A  | B  |
|-----|----|----|
| 1   | 1  | NA |
| 2   | 2  | 4  |
| 3   | 3  | 6  |
| 4   | 4  | NA |
| 5   | 5  | 10 |
| 6   | 6  | 12 |
| 7   | 7  | 14 |
| 8   | 8  | NA |
| 9   | 9  | 18 |
| 10  | 10 | 20 |

Now let's pretend we don't know the contents of our DataFrame and ask, for example, the following question:

Does column B contain an NA values?

The typeof approach won't work, here:

julia> typeof(df[:,symbol("B")]) == NAtype
false

The isna function is more adequate:

julia> any(isna(df[:,symbol("B")]))
  true

Upvotes: 11

Related Questions