Reputation: 67
I have three varaibles category,status and review in dataset new
category status review
max state yes
min state yes
min state yes
max fine yes
NA could state yes
max state yes
NA could state yes
I have two condition to process
condition one
delete when category = min
and status = state
i used below code
new1 <- new[!with(new,new$category=="min" &new$status=="state"),]
I got below output
category status review
max state yes
max fine yes
NA could state yes
max state yes
NA could state yes
condition two: remove records having category = min
and status = could state
new2 <- new1[!with(new1,new1$category=="min" &new1$status=="could state"),]
I was expecting below result
category status review
max state yes
max fine yes
NA could state yes
max state yes
NA could state yes
Output which i got is
category status review
max state yes
max fine yes
NA NA NA
max state yes
NA NA NA
I am not sure why records are getting replaced by NA
Is there any other way to give both condition in one condition(remove when category = min
and status = state
and could state
). I have referred other questions available
Upvotes: 1
Views: 27
Reputation: 887203
If a column contains an NA element, the logical condition returns NA for that element unless it is taken care off with is.na
to return TRUE. Here, I negated (!
) to return FALSE for the NA elements in 'category'. As per the OP's two conditions, we need to get TRUE for cases where 'category' is 'min' and 'status' is 'state' OR (|
) where 'category' is 'min' and 'status' is 'could state'. Once we get that TRUE values, just negate (!
) to convert TRUE to FALSE and viceversa. Then, subset the rows based on that index ('i1').
i1 <- !with(new1, (category == "min" & !is.na(category) & status =="state")|
(category =="min" & !is.na(category) & status == "could state"))
new1[i1,]
# category status review
#1 max state yes
#4 max fine yes
#5 <NA> could state yes
#6 max state yes
#7 <NA> could state yes
A slightly more compact option would be to use %in%
for multiple elements
i1 <- !with(new1, (category == "min" & !is.na(category) &
status %in% c("state", "could state")))
new1[i1,]
# category status review
#1 max state yes
#4 max fine yes
#5 <NA> could state yes
#6 max state yes
#7 <NA> could state yes
If we use both as %in%
, we can avoid the is.na
i1 <- !with(new1, (category %in% "min" & status %in% c("state", "could state")))
new1[i1,]
NOTE: In all the cases mentioned above, when using with
, we don't need to do new1$
, just use the column names to get the values.
Just to illustrate the previous point,
v1 <- c(NA, 3, 4, 3)
v1==3
#[1] NA TRUE FALSE TRUE
Note that ==
returns NA for NA values. If we use is.na
v1 ==3 & !is.na(v1)
#[1] FALSE TRUE FALSE TRUE
Or this can be avoided by using %in%
which return FALSE for NA
v1 %in% 3
#[1] FALSE TRUE FALSE TRUE
Regarding the NA rows, if we subset based on the ==
condition, the NA will remain as such
v1[v1==3]
#[1] NA 3 3
Suppose if 'v1' is a column in a data.frame
, in place of NA, an additional NA row will return
d1 <- data.frame(v1)
d1[d1$v1==3,, drop=FALSE]
# v1
#NA NA
#2 3
#4 3
new1 <- structure(list(category = c("max", "min", "min", "max", NA, "max",
NA), status = c("state", "state", "state", "fine", "could state",
"state", "could state"), review = c("yes", "yes", "yes", "yes",
"yes", "yes", "yes")), .Names = c("category", "status", "review"
), class = "data.frame", row.names = c(NA, -7L))
Upvotes: 3