extracting rows depending on the value of multiple columns

Question

I have a large dataframe that is simplified below. Given the following data frame structure, I need to collapse to return two distinct rows where col3 has different values but col1 and col2 have unique values.

dat <- data.frame("col1" = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1), 
       "col2" = c( "A","A", "A", "A", "A", "A", "A", "A", "A", "A"," A", "A", "A", "A", "A"),
       "col3" = c( "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Y", "Y", "Y", "Y", "Y", "Y"))

   col1 col2 col3
1     1    A    Z
2     1    A    Z
3     1    A    Z
4     1    A    Z
5     1    A    Z
6     1    A    Z
7     1    A    Z
8     1    A    Z
9     1    A    Z
10    1    A    Y
11    1    A    Y
12    1    A    Y
13    1    A    Y
14    1    A    Y
15    1    A    Y

So in this case I would need to return just the following:

   col1 col2 col3
      1    A    Z
      1    A    Y

If however, col3 was only z's I would return no rows. I can get counts of these data with the table function but I need to see the actual rows. Any ideas?

Thanks

Gopala · Accepted Answer

Try this with:

library(dplyr)
dat %>%
  group_by(col1, col2) %>%
  filter(length(unique(col3)) > 1) %>%
  distinct()

If dat is as follows:

dat <- structure(list(col1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1), col2 = c("A", "A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A"), col3 = c("Z", "Z", "Z", "Z", "Z", 
"Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z")), class = "data.frame", row.names = c(NA, 
-15L))

You get no rows as follows:

# A tibble: 0 x 3
# Groups:   col1, col2 [0]
# ... with 3 variables: col1 , col2 , col3

If dat is as you provided in the original post, you get the output as you needed:

# A tibble: 2 x 3
# Groups:   col1, col2 [1]
   col1 col2  col3 
    
1     1 A     Z    
2     1 A     Y

Notice that I am using length(unique()) in the filter instead of n_distinct because there is a dplyr bug that makes n_distinct in a filter of a grouped data frame run extremely slowly.

extracting rows depending on the value of multiple columns

Answers (2)

Related Questions