A.D.
A.D.

Reputation: 61

Remove a row based on conditional value of other rows in multiple columns in r

I am trying to remove a row on the condition that it does not have a specific value in another row based on the same column. (if a CASEID does not have a correlating form 8, delete the CASEID) e.g.

Form  CASEID  
7        001  
8        001  
8        001  
7        002  
7        003  
8        003  
8        003  

I have tried to search for an answer to this and haven't been able to find one. I feel like I need an if statement but my co-worker suggested a subset function. Any help would be appreciated!

Upvotes: 0

Views: 63

Answers (2)

Rachit Kinger
Rachit Kinger

Reputation: 361

Here are two solutions that I could think of. One using subset and the other with dplyr's inner_join().

The difference between the solutions is that in option 1, duplicate rows and the original order has been retained, and in option 2 duplicate rows have been removed.

Solution 1 - using subset and keeping duplicate rows:

df[df$CASEID %in% subset(df, Form == 8)$CASEID, ] 

The result is:

  Form CASEID
1    7      1
2    8      1
3    8      1
5    7      3
6    8      3
7    8      3

Solution 2 - with inner_join()

library(dplyr)
subset(df, Form == 8) %>% 
      select(CASEID) %>% 
      inner_join(df) %>% 
      select(Form, CASEID) %>% 
      distinct()

The result is:

  Form CASEID
1    7      1
2    8      1
3    7      3
4    8      3

Upvotes: 1

gos
gos

Reputation: 484

new_df <- subset(df, Form==8)

The second parameter of the subset function is a logical expression, just like an if statement, as you mentioned. Here, we subset rows based on if their form column is equal to 8.

Upvotes: 1

Related Questions