Reputation: 57
my data set has missing values marked as 'XXX'
I have tried na.omit(mydata)
df <- data.frame(X=factor(c(0.2, "XXX", 0.4, 0.1)), Y=factor(c(0.8, 1, 0.9, "XXX")))
here X and Y are factors. I found that the missing data is encoded as "XXX" by checking the levels of the factor.
I want to remove row "2" and row "4". can someone help, I have been trying for a while now.
Upvotes: 0
Views: 166
Reputation: 35739
You don't need to convert "XXX"
to NA
. Just filter "XXX"
directly.
library(dplyr)
df %>% filter(across(everything(), ~ . != "XXX"))
# X Y
# 1 0.2 0.8
# 2 0.4 0.9
The corresponding version using filter_all()
.
df %>% filter_all(all_vars(. != "XXX"))
A base R solution.
df[rowSums(df == "XXX") == 0, ]
Upvotes: 1
Reputation: 16998
Another option using tidyverse
:
df %>%
mutate(across(everything(), str_replace, "XXX", NA_character_)) %>%
drop_na()
# X Y
# 1 0.2 0.8
# 2 0.4 0.9
Upvotes: 1
Reputation: 7423
Two base R
solutions:
df <- subset(df, X != "XXX" & Y != "XXX")
or
df <- df[df$X != "XXX" & df$Y != "XXX",]
dplyr
solution:
library(dplyr)
df <- df %>% filter(X != "XXX" & Y != "XXX")
Gives us:
X Y
1 0.2 0.8
3 0.4 0.9
Upvotes: 1
Reputation: 4314
You can also filter for complete cases like this:
library(dplyr)
library(magrittr)
df %>% replace(.=="XXX", NA_character_) %>% filter(complete.cases(.))
The output is:
> df %>% replace(.=="XXX", NA_character_) %>% filter(complete.cases(.))
X Y
1 0.2 0.8
2 0.4 0.9
Upvotes: 2