RufM
RufM

Reputation: 65

How to delete rows with inverted values in R?

I have a table melted from a pairwise distance matrix of SNP differences. In the first column I have the pairs of isolates, which resulted from the combination of the isolate number in the matrix's column, with the isolate number in the matrix's row, like so:

Patients  Method1 Method2
101_117   0       0
101_98    0       0
117_101   0       0
117_98    0       0
120_128   0       0

I want to do posterior analysis on this data and for that I would like to eliminate the rows with duplicated pairs of isolates. However, these duplicated pairs of isolates are inverted, as we can see for isolates 101 and 117, which are present in the table as pair 101_117 and 117_101. Thus, I would like to keep just one of these duplicated pairs.

The basic commands duplicated and unique didn't solve my problem since the duplicate pairs have inverted names. I have also tried to follow the suggestions given in another question (Deleting reversed duplicates with R) but couldn't get them to work with my data, as I am not that experienced with R.

Any suggestions? Thank you in advance!

Upvotes: 2

Views: 244

Answers (2)

Wimpel
Wimpel

Reputation: 27792

I believe this will work (will also work on data.frames btw)

library(data.table)
library(stringr)
 DT <- fread("Patients  Method1 Method2
101_117   0       0
            101_98    0       0
            117_101   0       0
            117_98    0       0
            120_128   0       0")

DT[ !duplicated( lapply( stringr::str_extract_all( DT$Patients, "[0-9]+" ), sort ) ), ]

#    Patients Method1 Method2
# 1:  101_117       0       0
# 2:   101_98       0       0
# 3:   117_98       0       0
# 4:  120_128       0       0

Upvotes: 1

Sonny
Sonny

Reputation: 3183

You could sort the Patients and then duplicated should work

df$Patients <- sapply(df$Patients,function(x){
  paste(sort(as.numeric(unlist(strsplit(x, "_")))), collapse = "_")
}, USE.NAMES = F)

df <- df[!duplicated(df$Patients), ]

Upvotes: 0

Related Questions