Reputation: 61
I am trying to remove a row on the condition that it does not have a specific value in another row based on the same column. (if a CASEID does not have a correlating form 8, delete the CASEID) e.g.
Form CASEID
7 001
8 001
8 001
7 002
7 003
8 003
8 003
I have tried to search for an answer to this and haven't been able to find one. I feel like I need an if statement but my co-worker suggested a subset function. Any help would be appreciated!
Upvotes: 0
Views: 63
Reputation: 361
Here are two solutions that I could think of. One using subset and the other with dplyr's inner_join()
.
The difference between the solutions is that in option 1, duplicate rows and the original order has been retained, and in option 2 duplicate rows have been removed.
Solution 1 - using subset and keeping duplicate rows:
df[df$CASEID %in% subset(df, Form == 8)$CASEID, ]
The result is:
Form CASEID
1 7 1
2 8 1
3 8 1
5 7 3
6 8 3
7 8 3
Solution 2 - with inner_join()
library(dplyr)
subset(df, Form == 8) %>%
select(CASEID) %>%
inner_join(df) %>%
select(Form, CASEID) %>%
distinct()
The result is:
Form CASEID
1 7 1
2 8 1
3 7 3
4 8 3
Upvotes: 1
Reputation: 484
new_df <- subset(df, Form==8)
The second parameter of the subset
function is a logical expression, just like an if statement, as you mentioned. Here, we subset rows based on if their form column is equal to 8.
Upvotes: 1