Remove a row based on conditional value of other rows in multiple columns in r

Question

I am trying to remove a row on the condition that it does not have a specific value in another row based on the same column. (if a CASEID does not have a correlating form 8, delete the CASEID) e.g.

Form  CASEID  
7        001  
8        001  
8        001  
7        002  
7        003  
8        003  
8        003

I have tried to search for an answer to this and haven't been able to find one. I feel like I need an if statement but my co-worker suggested a subset function. Any help would be appreciated!

Rachit Kinger · Accepted Answer

Here are two solutions that I could think of. One using subset and the other with dplyr's inner_join().

The difference between the solutions is that in option 1, duplicate rows and the original order has been retained, and in option 2 duplicate rows have been removed.

Solution 1 - using subset and keeping duplicate rows:

df[df$CASEID %in% subset(df, Form == 8)$CASEID, ]

The result is:

  Form CASEID
1    7      1
2    8      1
3    8      1
5    7      3
6    8      3
7    8      3

Solution 2 - with inner_join()

library(dplyr)
subset(df, Form == 8) %>% 
      select(CASEID) %>% 
      inner_join(df) %>% 
      select(Form, CASEID) %>% 
      distinct()

The result is:

  Form CASEID
1    7      1
2    8      1
3    7      3
4    8      3

Remove a row based on conditional value of other rows in multiple columns in r

Answers (2)

Related Questions