chippycentra
chippycentra

Reputation: 3432

Remove groups within dataframe using dplyr

I would need help in order to remove groups where I have 1 unique value in COL3. Here is the df

 COL1 COL2           COL3   COL4
    G1   SEQ1_-_eje.2   SEQB87 A
    G1   SEQ.2_+_eje.2  SQB83  B 
    G1   SEQ.2_+_eje.2  SQB800 C
    G1   SEQ1_-_eje.2   SEB900 D
    G2   SEQJ_-_eje.9   SEQOPL E
    G2   SEQJ_-_eje.7   SEQOPL F
    G2   SEQJ_-_eQe.3   SEQOPL G
    G3   SEQQ_+_eje.2   LOP2   H
    G3   SEQQ_-_eje.2   LOPE   I

here only G2 have a unique COL3 value SEQOPL so I remove this group and keep G1 and G3that have respectively 3 and 2 COL3 different elements.

expected result :

 COL1 COL2           COL3   COL4
    G1   SEQ1_-_eje.2   SEQB87 A
    G1   SEQ.2_+_eje.2  SQB83  B 
    G1   SEQ.2_+_eje.2  SQB800 C
    G1   SEQ1_-_eje.2   SEB900 D
    G3   SEQQ_+_eje.2   LOP2   H
    G3   SEQQ_-_eje.2   LOPE   I

Maybe the idea would be to remove every groups where the length(unique(df$COL3))=1 ?

Upvotes: 0

Views: 319

Answers (2)

starja
starja

Reputation: 10375

Your idea is correct, the trick is to group by COL1 first:

df <- data.frame(COL1 = c(rep("G1", 4), rep("G2", 3), rep("G3", 2)),
                 COL3 = c("SEQB87", "SQB83", "SQB800", "SQB900",
                          rep("SEQOPL", 3), "LOP2", "LOPE"),
                 COL4 = LETTERS[1:9])

library(dplyr)

df %>% 
  group_by(COL1) %>% 
  filter(length(unique(COL3)) > 1)

# A tibble: 6 x 3
# Groups:   COL1 [2]
  COL1  COL3   COL4 
  <fct> <fct>  <fct>
1 G1    SEQB87 A    
2 G1    SQB83  B    
3 G1    SQB800 C    
4 G1    SQB900 D    
5 G3    LOP2   H    
6 G3    LOPE   I  

Upvotes: 2

Martin Gal
Martin Gal

Reputation: 16998

Try this one using package dplyr or tidyverse:

df %>%
  group_by(COL3) %>%
  mutate(n=n()) %>%
  filter(n == 1) %>%
  select(-n)

This gives you

# A tibble: 6 x 4
# Groups:   COL3 [6]
  COL1  COL2          COL3   COL4 
  <chr> <chr>         <chr>  <chr>
1 G1    SEQ1_-_eje.2  SEQB87 A    
2 G1    SEQ.2_+_eje.2 SQB83  B    
3 G1    SEQ.2_+_eje.2 SQB800 C    
4 G1    SEQ1_-_eje.2  SEB900 D    
5 G3    SEQQ_+_eje.2  LOP2   H    
6 G3    SEQQ_-_eje.2  LOPE   I 

Data

df <- read_table2("COL1 COL2           COL3   COL4
    G1   SEQ1_-_eje.2   SEQB87 A
    G1   SEQ.2_+_eje.2  SQB83  B 
    G1   SEQ.2_+_eje.2  SQB800 C
    G1   SEQ1_-_eje.2   SEB900 D
    G2   SEQJ_-_eje.9   SEQOPL E
    G2   SEQJ_-_eje.7   SEQOPL F
    G2   SEQJ_-_eQe.3   SEQOPL G
    G3   SEQQ_+_eje.2   LOP2   H
    G3   SEQQ_-_eje.2   LOPE   I")

Upvotes: 0

Related Questions