user6401955
user6401955

Reputation: 145

select only rows with duplicate id and specific value from another column in R

I have the following data with ID and value:

id <- c("1103-5","1103-5","1104-2","1104-2","1104-4","1104-4","1106-2","1106-2","1106-3","1106-3","2294-1","2294-1","2294-2","2294-2","2294-2","2294-3","2294-3","2294-3","2294-4","2294-4","2294-5","2294-5","2294-5","2300-1","2300-1","2300-2","2300-2","2300-4","2300-4","2321-1","2321-1","2321-2","2321-2","2321-3","2321-3","2321-4","2321-4","2347-1","2347-1","2347-2","2347-2")

value <- c(6,3,6,3,6,3,6,3,6,3,3,6,9,3,6,9,3,6,3,6,9,3,6,9,6,9,6,9,6,9,3,9,3,9,3,9,3,9,6,9,6)

If you notice, there are multiple values for the same id. What I'd like to do is get the value that are only 3 and 6 only if the IDs are the same. for eg. ID "1103-5" has both 3 and 6, so it should be in the list, but not "2347-2"

I'm using R

One method I tried is the following, but it gives me everything with value 3 and 6.

d <- data.frame(id, value)
group36 <- d[d$value == 3 | d$value == 6,]

and

d %>% group_by(id) %>% filter(3 == value | 6 == value)

The output should be like this:

id  value
1103-5  6
1103-5  3
1104-2  6
1104-2  3
1104-4  6
1104-4  3
1106-2  6
1106-2  3
1106-3  6
1106-3  3
2294-1  3
2294-1  6
2294-2  3
2294-2  6
2294-3  3
2294-3  6
2294-4  3
2294-4  6
2294-5  3
2294-5  6

Upvotes: 0

Views: 2627

Answers (2)

Tung
Tung

Reputation: 28331

Not sure if this is what you want. We can filter rows that equal to either 3 or 6 then convert from long to wide format and keep only columns which have both 3 and 6 values. After that, convert back to long format.

library(dplyr)
library(tidyr)

id <- c("1103-5","1103-5","1104-2","1104-2","1104-4","1104-4","1106-2","1106-2",
        "1106-3","1106-3","2294-1","2294-1","2294-2","2294-2","2294-2",
        "2294-3","2294-3","2294-3","2294-4","2294-4","2294-5","2294-5","2294-5",
        "2300-1","2300-1","2300-2","2300-2","2300-4","2300-4","2321-1","2321-1",
        "2321-2","2321-2","2321-3","2321-3","2321-4","2321-4","2347-1","2347-1","2347-2","2347-2")

value <- c(6,3,6,3,6,3,6,3,6,3,3,6,9,3,6,9,3,6,3,6,9,3,6,9,6,9,6,9,6,9,3,9,3,9,3,9,3,9,6,9,6)

d <- data.frame(id, value)

d %>% 
  group_by(id) %>% 
  filter(value %in% c(3, 6)) %>% 
  mutate(rows = 1:n()) %>%
  spread(key = id, value) %>% 
  select_if(~ all(!is.na(.)))

#> # A tibble: 2 x 11
#>    rows `1103-5` `1104-2` `1104-4` `1106-2` `1106-3` `2294-1` `2294-2`
#>   <int>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
#> 1     1        6        6        6        6        6        3        3
#> 2     2        3        3        3        3        3        6        6
#> # ... with 3 more variables: `2294-3` <dbl>, `2294-4` <dbl>,
#> #   `2294-5` <dbl>

d %>% 
  group_by(id) %>% 
  filter(value %in% c(3, 6)) %>% 
  mutate(rows = 1:n()) %>%
  spread(key = id, value) %>% 
  select_if(~ all(!is.na(.))) %>% 
  select(-rows) %>% 
  gather(id, value)

#> # A tibble: 20 x 2
#>    id     value
#>    <chr>  <dbl>
#>  1 1103-5     6
#>  2 1103-5     3
#>  3 1104-2     6
#>  4 1104-2     3
#>  5 1104-4     6
#>  6 1104-4     3
#>  7 1106-2     6
#>  8 1106-2     3
#>  9 1106-3     6
#> 10 1106-3     3
#> 11 2294-1     3
#> 12 2294-1     6
#> 13 2294-2     3
#> 14 2294-2     6
#> 15 2294-3     3
#> 16 2294-3     6
#> 17 2294-4     3
#> 18 2294-4     6
#> 19 2294-5     3
#> 20 2294-5     6

Created on 2018-07-01 by the reprex package (v0.2.0.9000).

Upvotes: 1

iod
iod

Reputation: 7592

d<-group_by(d,id)
filter(d,any(value==3),any(value==6))

This gives you all the IDs where there is both a value of 3 (somewhere) AND a value of 6 (somewhere). Mind you, your data contains some IDs with THREE values. In these cases, if both 3 and 6 are present, it will be included in the result.

If you want to exclude those lines that remain which done equal 3 or 6, add this:

filter(d,value==3 | value==6)

If you want to exclude IDs that also have 3 and 6 as values but also have OTHER values, use this:

filter(d,any(value==3),any(value==6),value==3 | value==6)

Upvotes: 2

Related Questions