Clarification regarding filters in R

Question

I have three varaibles category,status and review in dataset new

category    status       review
  max       state          yes
  min       state          yes
  min       state          yes
  max       fine           yes
  NA        could state    yes
  max       state          yes
  NA        could state    yes

I have two condition to process

condition one

delete when category = min and status = state

i used below code

new1 <- new[!with(new,new$category=="min" &new$status=="state"),]

I got below output

category    status       review
  max       state         yes
  max       fine          yes
  NA        could state   yes
  max       state         yes
  NA        could state   yes

condition two: remove records having category = min and status = could state

new2 <- new1[!with(new1,new1$category=="min" &new1$status=="could state"),]

I was expecting below result

category    status       review
  max       state          yes
  max       fine           yes
  NA        could state    yes
  max       state          yes   
  NA        could state    yes

Output which i got is

category    status     review
  max       state       yes
  max       fine        yes
  NA        NA          NA
  max       state       yes
  NA        NA          NA

I am not sure why records are getting replaced by NA

Is there any other way to give both condition in one condition(remove when category = min and status = state and could state). I have referred other questions available

akrun · Accepted Answer

If a column contains an NA element, the logical condition returns NA for that element unless it is taken care off with is.na to return TRUE. Here, I negated (!) to return FALSE for the NA elements in 'category'. As per the OP's two conditions, we need to get TRUE for cases where 'category' is 'min' and 'status' is 'state' OR (|) where 'category' is 'min' and 'status' is 'could state'. Once we get that TRUE values, just negate (!) to convert TRUE to FALSE and viceversa. Then, subset the rows based on that index ('i1').

i1 <- !with(new1, (category == "min" & !is.na(category) & status =="state")| 
           (category =="min" & !is.na(category) & status == "could state"))
new1[i1,]
#   category      status review
#1      max       state    yes
#4      max        fine    yes
#5      could state    yes
#6      max       state    yes
#7      could state    yes

A slightly more compact option would be to use %in% for multiple elements

i1 <- !with(new1, (category == "min" & !is.na(category) & 
                            status %in% c("state", "could state")))
new1[i1,]
#  category      status review
#1      max       state    yes
#4      max        fine    yes
#5      could state    yes
#6      max       state    yes
#7      could state    yes

If we use both as %in%, we can avoid the is.na

i1 <- !with(new1, (category %in% "min" & status %in% c("state", "could state")))
new1[i1,]

NOTE: In all the cases mentioned above, when using with, we don't need to do new1$, just use the column names to get the values.

Just to illustrate the previous point,

 v1 <- c(NA, 3, 4, 3)
 v1==3
 #[1]    NA  TRUE FALSE  TRUE

Note that == returns NA for NA values. If we use is.na

 v1 ==3 & !is.na(v1)
#[1] FALSE  TRUE FALSE  TRUE

Or this can be avoided by using %in% which return FALSE for NA

v1 %in% 3
#[1] FALSE  TRUE FALSE  TRUE

Regarding the NA rows, if we subset based on the == condition, the NA will remain as such

v1[v1==3]
#[1] NA  3  3

Suppose if 'v1' is a column in a data.frame, in place of NA, an additional NA row will return

d1 <- data.frame(v1)
d1[d1$v1==3,, drop=FALSE]
#   v1
#NA NA
#2   3
#4   3

data

new1 <- structure(list(category = c("max", "min", "min", "max", NA, "max", 
NA), status = c("state", "state", "state", "fine", "could state", 
"state", "could state"), review = c("yes", "yes", "yes", "yes", 
"yes", "yes", "yes")), .Names = c("category", "status", "review"
), class = "data.frame", row.names = c(NA, -7L))

Clarification regarding filters in R

Answers (1)

data

Related Questions