Reputation: 277
I have a dataset df:
df=data.frame(rbind(c("A",1,1,"abc"),
c("B",0,0,"def"),
c("C",0,1,"hep"),
c("A",1,1,"hit"),
c("B",0,1,"occ"),
c("C",1,1,"tem"),
c("A",1,1,"twi"),
c("B",1,1,"twa"),
c("C",1,1,"mit"),
c("A",1,1,"mot"),
c("C",1,1,"mot"),
c("B",1,1,"mjak")))
names(df)=c("id","v1","v2","check")
I want to create a subset of ids in DF, that contain values included in the "ch.vars" vector in the "check" column.
ch.vars=c("abc","hit","mot","twi","mjak")
If an id contains any values other than that given in "ch.vars" they are to be excluded form the dataset.For example ids B and C contain other values in the check column, so they are to be excluded in the subset.
Here is what I have tried so far:
df$check.var=ifelse(df$check %in% ch.vars,1,0)
df=arrange(df,id)
st1=filter(df,check.var==0)
st1=as.character(unique(st1$id))
df2=df[!df$id %in% st1,]
> df2
id v1 v2 check check.var
1 A 1 1 abc 1
2 A 1 1 hit 1
3 A 1 1 twi 1
4 A 1 1 mot 1
This works but I was wondering if there was a more efficient way to do this, i.e achieve the result in less steps. Thank you!
Upvotes: 1
Views: 67
Reputation: 4474
And a data.table
solution:
library(data.table)
data.table(df)[,.SD[all(check%in%ch.vars)],by="id"]
# id v1 v2 check
#1: A 1 1 abc
#2: A 1 1 hit
#3: A 1 1 twi
#4: A 1 1 mot
You can also use setkey
for id
to make it faster.
Upvotes: 3
Reputation: 78610
You can do this with group_by
and filter
in the dplyr package:
library(dplyr)
df2 = df %>%
group_by(id) %>%
filter(all(check %in% ch.vars))
Upvotes: 3