Reputation: 1921
I want to know how to omit NA
values in a data frame, but only in some columns I am interested in.
For example,
DF <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA), z=c(NA, 33, 22))
but I only want to omit the data where y
is NA
, therefore the result should be
x y z
1 1 0 NA
2 2 10 33
na.omit
seems delete all rows contain any NA
.
Can somebody help me out of this simple question?
But if now I change the question like:
DF <- data.frame(x = c(1, 2, 3,NA), y = c(1,0, 10, NA), z=c(43,NA, 33, NA))
If I want to omit only x=na
or z=na
, where can I put the |
in function?
Upvotes: 177
Views: 327837
Reputation: 41601
You don't need to create a custom function with complete.cases
to remove the rows with NA in a certain column. Here is a reproducible example:
DF <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA), z=c(NA, 33, 22))
DF
#> x y z
#> 1 1 0 NA
#> 2 2 10 33
#> 3 3 NA 22
DF[complete.cases(DF$y),]
#> x y z
#> 1 1 0 NA
#> 2 2 10 33
Created on 2022-08-27 with reprex v2.0.2
As you can see, it removed the row with NA in certain column.
Upvotes: 1
Reputation: 8826
To update, a tidyverse
approach with dplyr
:
library(dplyr)
your_data_frame %>%
filter(!is.na(region_column))
Upvotes: 2
Reputation: 137
Just try this:
DF %>% t %>% na.omit %>% t
It transposes the data frame and omits null rows which were 'columns' before transposition and then you transpose it back.
Upvotes: 2
Reputation: 179
It is possible to use na.omit
for data.table
:
na.omit(data, cols = c("x", "z"))
Upvotes: 17
Reputation: 5408
Omit row if either of two specific columns contain <NA>
.
DF[!is.na(DF$x)&!is.na(DF$z),]
Upvotes: 7
Reputation: 19454
You could use the complete.cases
function and put it into a function thusly:
DF <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA), z=c(NA, 33, 22))
completeFun <- function(data, desiredCols) {
completeVec <- complete.cases(data[, desiredCols])
return(data[completeVec, ])
}
completeFun(DF, "y")
# x y z
# 1 1 0 NA
# 2 2 10 33
completeFun(DF, c("y", "z"))
# x y z
# 2 2 10 33
EDIT: Only return rows with no NA
s
If you want to eliminate all rows with at least one NA
in any column, just use the complete.cases
function straight up:
DF[complete.cases(DF), ]
# x y z
# 2 2 10 33
Or if completeFun
is already ingrained in your workflow ;)
completeFun(DF, names(DF))
Upvotes: 95
Reputation: 6335
Hadley's tidyr
just got this amazing function drop_na
library(tidyr)
DF %>% drop_na(y)
x y z
1 1 0 NA
2 2 10 33
Upvotes: 112
Reputation: 1013
Use 'subset'
DF <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA), z=c(NA, 33, 22))
subset(DF, !is.na(y))
Upvotes: 36
Reputation: 115485
Use is.na
DF <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA), z=c(NA, 33, 22))
DF[!is.na(DF$y),]
Upvotes: 246