Reputation: 3432
I would need help in order to remove groups where I have 1 unique value in COL3. Here is the df
COL1 COL2 COL3 COL4
G1 SEQ1_-_eje.2 SEQB87 A
G1 SEQ.2_+_eje.2 SQB83 B
G1 SEQ.2_+_eje.2 SQB800 C
G1 SEQ1_-_eje.2 SEB900 D
G2 SEQJ_-_eje.9 SEQOPL E
G2 SEQJ_-_eje.7 SEQOPL F
G2 SEQJ_-_eQe.3 SEQOPL G
G3 SEQQ_+_eje.2 LOP2 H
G3 SEQQ_-_eje.2 LOPE I
here only G2
have a unique COL3
value SEQOPL
so I remove this group and keep G1
and G3
that have respectively 3
and 2 COL3
different elements.
expected result :
COL1 COL2 COL3 COL4
G1 SEQ1_-_eje.2 SEQB87 A
G1 SEQ.2_+_eje.2 SQB83 B
G1 SEQ.2_+_eje.2 SQB800 C
G1 SEQ1_-_eje.2 SEB900 D
G3 SEQQ_+_eje.2 LOP2 H
G3 SEQQ_-_eje.2 LOPE I
Maybe the idea would be to remove every groups where the length(unique(df$COL3))=1
?
Upvotes: 0
Views: 319
Reputation: 10375
Your idea is correct, the trick is to group by COL1
first:
df <- data.frame(COL1 = c(rep("G1", 4), rep("G2", 3), rep("G3", 2)),
COL3 = c("SEQB87", "SQB83", "SQB800", "SQB900",
rep("SEQOPL", 3), "LOP2", "LOPE"),
COL4 = LETTERS[1:9])
library(dplyr)
df %>%
group_by(COL1) %>%
filter(length(unique(COL3)) > 1)
# A tibble: 6 x 3
# Groups: COL1 [2]
COL1 COL3 COL4
<fct> <fct> <fct>
1 G1 SEQB87 A
2 G1 SQB83 B
3 G1 SQB800 C
4 G1 SQB900 D
5 G3 LOP2 H
6 G3 LOPE I
Upvotes: 2
Reputation: 16998
Try this one using package dplyr
or tidyverse
:
df %>%
group_by(COL3) %>%
mutate(n=n()) %>%
filter(n == 1) %>%
select(-n)
This gives you
# A tibble: 6 x 4
# Groups: COL3 [6]
COL1 COL2 COL3 COL4
<chr> <chr> <chr> <chr>
1 G1 SEQ1_-_eje.2 SEQB87 A
2 G1 SEQ.2_+_eje.2 SQB83 B
3 G1 SEQ.2_+_eje.2 SQB800 C
4 G1 SEQ1_-_eje.2 SEB900 D
5 G3 SEQQ_+_eje.2 LOP2 H
6 G3 SEQQ_-_eje.2 LOPE I
df <- read_table2("COL1 COL2 COL3 COL4
G1 SEQ1_-_eje.2 SEQB87 A
G1 SEQ.2_+_eje.2 SQB83 B
G1 SEQ.2_+_eje.2 SQB800 C
G1 SEQ1_-_eje.2 SEB900 D
G2 SEQJ_-_eje.9 SEQOPL E
G2 SEQJ_-_eje.7 SEQOPL F
G2 SEQJ_-_eQe.3 SEQOPL G
G3 SEQQ_+_eje.2 LOP2 H
G3 SEQQ_-_eje.2 LOPE I")
Upvotes: 0