I'm looking to see how I can delete any rows that have a duplicate name. For example, I have a name in column A that has these: A B C D ABC123 0 1 1 ABC123 1 2 2 X2X2 1 1 0 X1XD-01 1 0 0 BC-56 0 2 1 BC-56 1 1 1 YUA09 0 0 1 GGO-09S 0 1 2 Any name in column A that has a duplicate value, both of them are deleted, such that the rows are gone. Goal: A B C D X2X2 1 1 0 X1XD-01 1 0 0 YUA09 0 0 1 GGO-09S 0 1 2 What is the most efficient way to approach this? Thanks

Reputation: 55

Deleting all rows that have same name in a column

I'm looking to see how I can delete any rows that have a duplicate name. For example, I have a name in column A that has these:

A	B	C	D
ABC123	0	1	1
ABC123	1	2	2
X2X2	1	1	0
X1XD-01	1	0	0
BC-56	0	2	1
BC-56	1	1	1
YUA09	0	0	1
GGO-09S	0	1	2

Any name in column A that has a duplicate value, both of them are deleted, such that the rows are gone.

Goal:

A	B	C	D
X2X2	1	1	0
X1XD-01	1	0	0
YUA09	0	0	1
GGO-09S	0	1	2

What is the most efficient way to approach this?

Thanks

Upvotes: 0

Answers (3)

Ronak Shah

Reputation: 389175

Count the frequency with table and select only those values with 1 row.

subset(df, A %in% names(Filter(function(x) x == 1, table(A))))

#        A B C D
#3    X2X2 1 1 0
#4 X1XD-01 1 0 0
#7   YUA09 0 0 1
#8 GGO-09S 0 1 2

Upvotes: 1

GuedesBF

Reputation: 9878

We can group_by the desired column and filter out all groups with n() >=2:

library(dplyr)

df %>% group_by(A) %>% filter(n()==1)

# A tibble: 4 x 4
# Groups:   A [4]
  A           B     C     D
  <chr>   <int> <int> <int>
1 X2X2        1     1     0
2 X1XD-01     1     0     0
3 YUA09       0     0     1
4 GGO-09S     0     1     2

Upvotes: 2

akrun

Reputation: 887651

We can use duplicated to create a logical vector

df1[!(duplicated(df1$A)|duplicated(df1$A, fromLast = TRUE)),]
        A B C D
3    X2X2 1 1 0
4 X1XD-01 1 0 0
7   YUA09 0 0 1
8 GGO-09S 0 1 2

data

df1 <- structure(list(A = c("ABC123", "ABC123", "X2X2", "X1XD-01", "BC-56", 
"BC-56", "YUA09", "GGO-09S"), B = c(0L, 1L, 1L, 1L, 0L, 1L, 0L, 
0L), C = c(1L, 2L, 1L, 0L, 2L, 1L, 0L, 1L), D = c(1L, 2L, 0L, 
0L, 1L, 1L, 1L, 2L)), class = "data.frame", row.names = c(NA, 
-8L))

Upvotes: 0

Deleting all rows that have same name in a column

Answers (3)

data

Related Questions