Reputation: 55
I'm looking to see how I can delete any rows that have a duplicate name. For example, I have a name in column A that has these:
A | B | C | D |
---|---|---|---|
ABC123 | 0 | 1 | 1 |
ABC123 | 1 | 2 | 2 |
X2X2 | 1 | 1 | 0 |
X1XD-01 | 1 | 0 | 0 |
BC-56 | 0 | 2 | 1 |
BC-56 | 1 | 1 | 1 |
YUA09 | 0 | 0 | 1 |
GGO-09S | 0 | 1 | 2 |
Any name in column A that has a duplicate value, both of them are deleted, such that the rows are gone.
Goal:
A | B | C | D |
---|---|---|---|
X2X2 | 1 | 1 | 0 |
X1XD-01 | 1 | 0 | 0 |
YUA09 | 0 | 0 | 1 |
GGO-09S | 0 | 1 | 2 |
What is the most efficient way to approach this?
Thanks
Upvotes: 0
Views: 390
Reputation: 389175
Count the frequency with table
and select only those values with 1 row.
subset(df, A %in% names(Filter(function(x) x == 1, table(A))))
# A B C D
#3 X2X2 1 1 0
#4 X1XD-01 1 0 0
#7 YUA09 0 0 1
#8 GGO-09S 0 1 2
Upvotes: 1
Reputation: 9878
We can group_by
the desired column and filter out all groups with n()
>=2:
library(dplyr)
df %>% group_by(A) %>% filter(n()==1)
# A tibble: 4 x 4
# Groups: A [4]
A B C D
<chr> <int> <int> <int>
1 X2X2 1 1 0
2 X1XD-01 1 0 0
3 YUA09 0 0 1
4 GGO-09S 0 1 2
Upvotes: 2
Reputation: 887651
We can use duplicated
to create a logical vector
df1[!(duplicated(df1$A)|duplicated(df1$A, fromLast = TRUE)),]
A B C D
3 X2X2 1 1 0
4 X1XD-01 1 0 0
7 YUA09 0 0 1
8 GGO-09S 0 1 2
df1 <- structure(list(A = c("ABC123", "ABC123", "X2X2", "X1XD-01", "BC-56",
"BC-56", "YUA09", "GGO-09S"), B = c(0L, 1L, 1L, 1L, 0L, 1L, 0L,
0L), C = c(1L, 2L, 1L, 0L, 2L, 1L, 0L, 1L), D = c(1L, 2L, 0L,
0L, 1L, 1L, 1L, 2L)), class = "data.frame", row.names = c(NA,
-8L))
Upvotes: 0