Reputation: 33
Somehow I got mentally stuck while filtering patent data. So imagine you have:
expl <- data.frame(PatNr=c(1,1,1,2,2,2,2,2), Country=c("AZ","AZ","PE","AZ","PS","HQ","HQ","PV"))
#> PatNr Country
#> 1 1 AZ
#> 2 1 AZ
#> 3 1 PE
#> 4 2 AZ
#> 5 2 PS
#> 6 2 HQ
#> 7 2 HQ
#> 8 2 PV
What I want is to only have those PatNr cases in my data.frame that contain AZ AND PS. All other PatNr cases can be dropped. So in the given example, I would like the script to delete all PatNr=1 rows and keep the PatNr=2 rows.
Subsetting the rows into in this case two rows will be tricky as the actual data has nine more crucial variables attached to it which differ per row.
Upvotes: 2
Views: 137
Reputation: 6193
Using base R
res <- lapply(split(expl, expl$PatNr), lvls = c("AZ", "PS"), function(y, lvls) {
y[all(lvls %in% y$Country)]
})
do.call(rbind, res)
PatNr Country
2.4 2 AZ
2.5 2 PS
2.6 2 HQ
2.7 2 HQ
2.8 2 PV
Upvotes: 1
Reputation: 13139
Using dplyr:
library(dplyr)
expl2 <- expl %>%
group_by(PatNr) %>%
filter(all(c("AZ","PS") %in% Country))
expl2
Upvotes: 7
Reputation: 7796
Here's a messy for
loop that will do the trick: I'm sure there are better ways but this should work
todelete=numeric(0)
for(i in unique(expl$PatNr)){
countries = as.character(unique(expl$Country[expl$PatNr==i]))
if(!all(c("AZ", "PS") %in% countries)){
todelete=c(todelete, i)
}
}
expl[!expl$PatNr %in% todelete,]
Upvotes: -1