Igor Rivin
Igor Rivin

Reputation: 4864

analogue of pandas `dropna` in Julia

I have a Julia matrix, (I can make it into a dataframe, of course, if it helps) and I want to drop all rows and columns with NaN values. Google seems to not be helpful. In pandas this is trivial: df.dropna().dropna(axis=1)

Upvotes: 2

Views: 1097

Answers (1)

Bogumił Kamiński
Bogumił Kamiński

Reputation: 69869

Here is an answer for DataFrames.jl. To drop rows and columns having missing values do the following respectively:

julia> using DataFrames

julia> df = DataFrame(a=[1, 2, missing], b=[1, missing, 3], c=[1, 2, 3])
3×3 DataFrame
 Row │ a        b        c
     │ Int64?   Int64?   Int64
─────┼─────────────────────────
   1 │       1        1      1
   2 │       2  missing      2
   3 │ missing        3      3

julia> dropmissing(df)
1×3 DataFrame
 Row │ a      b      c
     │ Int64  Int64  Int64
─────┼─────────────────────
   1 │     1      1      1

julia> df[all.(!ismissing, eachrow(df)), :] # the same using 2-dimensional indexing
1×3 DataFrame
 Row │ a       b       c
     │ Int64?  Int64?  Int64
─────┼───────────────────────
   1 │      1       1      1

julia> select(df, all.(!ismissing, eachcol(df)))
3×1 DataFrame
 Row │ c
     │ Int64
─────┼───────
   1 │     1
   2 │     2
   3 │     3

julia> df[:, all.(!ismissing, eachcol(df))] # the same using 2-dimensional indexing
3×1 DataFrame
 Row │ c
     │ Int64
─────┼───────
   1 │     1
   2 │     2
   3 │     3

Note that it is much easier to drop rows than columns. The reason is that the design decision in DataFrames.jl was that most functions treat data frame as a collection of rows and the dropmissing function is an example of such case.

The major exception are:

  • indexing (which is always two dimensional)
  • select, transform and combine functions which work on columns

For matrices this is similar, but since they do not favor rows over columns like data frames you can do e.g.:

julia> mat = Matrix(df)
3×3 Array{Union{Missing, Int64},2}:
 1         1         1
 2          missing  2
  missing  3         3

julia> mat[all.(!ismissing, eachrow(df)), :]
1×3 Array{Union{Missing, Int64},2}:
 1  1  1

julia> mat[:,  all.(!ismissing, eachcol(df))]
3×1 Array{Union{Missing, Int64},2}:
 1
 2
 3

Upvotes: 5

Related Questions