Compare two data.frame and delete rows with common characters

Question

I have two data.frame x1 & x2. I want to remove rows from x2 if there is a common gene found in x1 and x2

x1 <- chr   start   end         Genes   
      1      8401    8410      Mndal,Mnda,Ifi203,Ifi202b    
      2      8001    8020      Cyb5r1,Adipor1,Klhl12    
      3      4001    4020      Alyref2,Itln1,Cd244  

x2 <- chr   start   end         Genes
      1      8861   8868       Olfr1193 
      1      8405    8420      Mrgprx3-ps,Mrgpra1,Mrgpra2a,Mndal,Mrgpra2b   
      2      8501    8520      Chia,Chi3l3,Chi3l4   
      3      4321    4670      Tdpoz4,Tdpoz3,Tdpoz5 



x2 <- chr   start   end         Genes   
      1      8861   8868       Olfr1193
      2      8501    8520      Chia,Chi3l3,Chi3l4   
      3      4321    4670      Tdpoz4,Tdpoz3,Tdpoz5

akrun · Accepted Answer

You could try

x2[mapply(function(x,y) !any(x %in% y), 
        strsplit(x1$Genes, ','), strsplit(x2$Genes, ',')),]
#  chr start  end                Genes
#2   2  8501 8520   Chia,Chi3l3,Chi3l4
#3   3  4321 4670 Tdpoz4,Tdpoz3,Tdpoz5

Or replace !any(x %in% y) with length(intersect(x,y))==0.

NOTE: If the "Genes" column is "factor", convert it to "character" as strsplit cannot take 'factor' class. i.e. strsplit(as.character(x1$Genes, ','))

Update

Based on the new dataset for 'x2', we can merge the two datasets by the 'chr' column, strsplit the 'Genes.x', 'Genes.y' from the output dataset ('xNew'), get the logical index based on the occurrence of any element of 'Genes.x' in 'Genes.y' strings, use that to subset the 'x2' dataset

 xNew <- merge(x1, x2[,c(1,4)], by='chr')
 indx <- mapply(function(x,y) any(x %in% y), 
      strsplit(xNew$Genes.x, ','), strsplit(xNew$Genes.y, ','))
 x2[!indx,]
 # chr start  end                Genes
 #1   1  8861 8868             Olfr1193
 #3   2  8501 8520   Chia,Chi3l3,Chi3l4
 #4   3  4321 4670 Tdpoz4,Tdpoz3,Tdpoz5

Compare two data.frame and delete rows with common characters

Answers (1)

Update

Related Questions