Excluding rows if present in second dataframe in R

Question

I have 2 dataframes.

df1-

col1   col2   col3    col4    col5
name1    A     23      x       y
name1    A     29      x       y
name1    B     17      x       y
name1    A     77      x       y

df2-

   
col1  col2  col3
B     17      LL1
Z     193     KK1       
A     77      LO9
Y     80      LK2

I want to return those rows from df1 if col2 and col3 of df1 are not equal to col1 and col2 of df2.

The output should be-

col1   col2   col3    col4    col5
name1    A     23      x       y
name1    A     29      x       y

Solution I found-

unique.rows <- function (df1, df2) {   
  out <- NULL
  for (i in 1:nrow(df1)) {
    found <- FALSE
    for (j in 1:nrow(df2)) {
      if (all(df1[i,2:3] == df2[j,1:2])) {
        found <- TRUE
        break
      }
    }
    if (!found) out <- rbind(out, df1[i,])
  }
  out
}

This solution is working fine but initially, I was applying for small dataframes. Now my df1 has about 10k rows and df2 has about 7 million rows. It is just running and running from last 2 days. Could anyone please suggest a fast way to do this?

shhhhimhuntingrabbits · Accepted Answer

try

> df1[!paste(df1$col2,df1$col3)%in%paste(df2$col1,df2$col2),]
   col1 col2 col3 col4 col5
1 name1    A   23    x    y
2 name1    A   29    x    y

Excluding rows if present in second dataframe in R

Answers (2)

Related Questions