Reputation: 4864
I have a Julia matrix, (I can make it into a dataframe, of course, if it helps) and I want to drop all rows and columns with NaN values. Google seems to not be helpful. In pandas this is trivial: df.dropna().dropna(axis=1)
Upvotes: 2
Views: 1097
Reputation: 69869
Here is an answer for DataFrames.jl. To drop rows and columns having missing values do the following respectively:
julia> using DataFrames
julia> df = DataFrame(a=[1, 2, missing], b=[1, missing, 3], c=[1, 2, 3])
3×3 DataFrame
Row │ a b c
│ Int64? Int64? Int64
─────┼─────────────────────────
1 │ 1 1 1
2 │ 2 missing 2
3 │ missing 3 3
julia> dropmissing(df)
1×3 DataFrame
Row │ a b c
│ Int64 Int64 Int64
─────┼─────────────────────
1 │ 1 1 1
julia> df[all.(!ismissing, eachrow(df)), :] # the same using 2-dimensional indexing
1×3 DataFrame
Row │ a b c
│ Int64? Int64? Int64
─────┼───────────────────────
1 │ 1 1 1
julia> select(df, all.(!ismissing, eachcol(df)))
3×1 DataFrame
Row │ c
│ Int64
─────┼───────
1 │ 1
2 │ 2
3 │ 3
julia> df[:, all.(!ismissing, eachcol(df))] # the same using 2-dimensional indexing
3×1 DataFrame
Row │ c
│ Int64
─────┼───────
1 │ 1
2 │ 2
3 │ 3
Note that it is much easier to drop rows than columns. The reason is that the design decision in DataFrames.jl was that most functions treat data frame as a collection of rows and the dropmissing
function is an example of such case.
The major exception are:
select
, transform
and combine
functions which work on columnsFor matrices this is similar, but since they do not favor rows over columns like data frames you can do e.g.:
julia> mat = Matrix(df)
3×3 Array{Union{Missing, Int64},2}:
1 1 1
2 missing 2
missing 3 3
julia> mat[all.(!ismissing, eachrow(df)), :]
1×3 Array{Union{Missing, Int64},2}:
1 1 1
julia> mat[:, all.(!ismissing, eachcol(df))]
3×1 Array{Union{Missing, Int64},2}:
1
2
3
Upvotes: 5