Seb Z.
Seb Z.

Reputation: 33

Filtering matching rows in a single data.frame

Somehow I got mentally stuck while filtering patent data. So imagine you have:

expl <- data.frame(PatNr=c(1,1,1,2,2,2,2,2), Country=c("AZ","AZ","PE","AZ","PS","HQ","HQ","PV"))

#>   PatNr    Country
#> 1       1        AZ
#> 2       1        AZ
#> 3       1        PE
#> 4       2        AZ
#> 5       2        PS
#> 6       2        HQ
#> 7       2        HQ
#> 8       2        PV

What I want is to only have those PatNr cases in my data.frame that contain AZ AND PS. All other PatNr cases can be dropped. So in the given example, I would like the script to delete all PatNr=1 rows and keep the PatNr=2 rows.

Subsetting the rows into in this case two rows will be tricky as the actual data has nine more crucial variables attached to it which differ per row.

Upvotes: 2

Views: 137

Answers (3)

Whitebeard
Whitebeard

Reputation: 6193

Using base R

res <- lapply(split(expl, expl$PatNr), lvls = c("AZ", "PS"), function(y, lvls)     { 
   y[all(lvls %in% y$Country)]
})
do.call(rbind, res)
    PatNr Country
2.4     2      AZ
2.5     2      PS
2.6     2      HQ
2.7     2      HQ
2.8     2      PV

Upvotes: 1

Heroka
Heroka

Reputation: 13139

Using dplyr:

library(dplyr)


expl2 <- expl %>% 
  group_by(PatNr) %>% 
  filter(all(c("AZ","PS") %in% Country)) 
expl2

Upvotes: 7

C_Z_
C_Z_

Reputation: 7796

Here's a messy for loop that will do the trick: I'm sure there are better ways but this should work

todelete=numeric(0)
for(i in unique(expl$PatNr)){
  countries = as.character(unique(expl$Country[expl$PatNr==i]))
  if(!all(c("AZ", "PS") %in% countries)){
    todelete=c(todelete, i)
  }
}


expl[!expl$PatNr %in% todelete,]

Upvotes: -1

Related Questions