Reputation: 877
I have this condition:
ID N1 N2
1 6387 6187
2 6290 6386
3 6295 6295
4 6043 6392
5 6042 6043
6 6050 6031
7 6050 6036
8 6321 6434
9 6440 6397
with duplicates in different columns and rows, and I would like to remove ALL duplicates in N1 and N2, like this:
ID N1 N2
1 6387 6187
2 6290 6386
8 6321 6434
9 6440 6397
I tried these codes:
dt[(duplicated(dt[,2:3]) | duplicated(dt[,2:3], fromLast = TRUE)), ]
and
library(dplyr)
dt %>% distinct(N1,N2, .keep_all = TRUE)
But didn't work.
Upvotes: 0
Views: 90
Reputation:
Simple way to filter using dplyr
. Each line builds up the logic:
library(dplyr)
df %>%
filter(!duplicated(N1),
!duplicated(N2),
!duplicated(N1, fromLast = TRUE),
!duplicated(N2, fromLast = TRUE),
!(N1 %in% N2),
!(N2 %in% N1))
Upvotes: 1
Reputation: 28705
You can make the contents of your data frame into a vector, then check duplicated forward or duplicated backward. Next reshape these results into the same dimensions as your data and filter to rows where the rowSum of this new dupes
data is 0, i.e. no column is a dupe. (not clear if you want to include ID in your dupe checks, but you can change unlist(df)
to unlist(df[some_columns])
and it will work the same.
df_vec <- unlist(df)
dupes <- duplicated(df_vec) | duplicated(df_vec, fromLast = TRUE)
df[rowSums(matrix(dupes, nrow(df))) == 0,]
# ID N1 N2
# 1 1 6387 6187
# 2 2 6290 6386
# 8 8 6321 6434
# 9 9 6440 6397
Upvotes: 3
Reputation: 174476
Your own method only needed modified a little to concatenate the two columns instead of passing them as a 2x10 array.
nondupes <- !duplicated(c(dt[,2], dt[,3])) &
!duplicated(c(dt[,2], dt[,3]), fromLast = TRUE)
dt[nondupes[1:nrow(dt)] & nondupes[nrow(dt) + 1:nrow(dt)],]
Upvotes: 1
Reputation: 260
Here is a base R
solution (data.table
is only loaded to fread()
the data):
library(data.table)
my_data <- fread("
ID N1 N2
1 6387 6187
2 6290 6386
3 6295 6295
4 6043 6392
5 6042 6043
6 6050 6031
7 6050 6036
8 6321 6434
9 6440 6397
")
# duplicated entries
dup <- c(my_data$N1, my_data$N2)[duplicated(c(my_data$N1, my_data$N2))]
# keep rows that do not bear any of the duplicated entries
my_data[!(my_data$N1 %in% dup | my_data$N2 %in% dup),]
ID N1 N2
1: 1 6387 6187
2: 2 6290 6386
3: 8 6321 6434
4: 9 6440 6397
Upvotes: 1
Reputation: 40171
One dplyr
and tidyr
option could be:
df %>%
rowid_to_column() %>%
pivot_longer(-c(ID, rowid)) %>%
group_by(value) %>%
filter(n() == 1) %>%
group_by(rowid) %>%
filter(n() == 2) %>%
ungroup() %>%
pivot_wider(names_from = name, values_from = value) %>%
select(-rowid)
ID N1 N2
<int> <int> <int>
1 1 6387 6187
2 2 6290 6386
3 8 6321 6434
4 9 6440 6397
Upvotes: 1