Silvia
Silvia

Reputation: 405

Create a new dataframe according to the contrast between two similar df

I have a dataframe made like this:

  X Y  Z T
  1 2  4 2
  3 2  1 4
  7 5 NA 3

After several steps (not important which one) i obtained this df:

  X Y Z T
  1 2 4 2
  3 2 NA 4
  7 5 NA 3

i want to obtain a new dataframe made by only the rows which didn't change during the steps; the result would be this one:

 X  Y  Z  T
 1  2  4  2
 7  5  NA 3

How could I do?

Upvotes: 9

Views: 222

Answers (4)

ira
ira

Reputation: 2644

I am afraid that neither semi join, nor intersect or merge are the correct answers. merge and intersect will not handle duplicate rows properly. semi join will change order of the rows.

From this perspective, I think the only correct one so far is akrun's.

You could also do something like:

df1[rowSums(((df1 == df2) | (is.na(df1) & is.na(df2))), na.rm = T) == ncol(df1),]

But I think akrun's way is more elegant and likely to perform better in terms of speed.

Upvotes: 3

www
www

Reputation: 39154

Another dplyr solution: semi_join.

dt1 %>% semi_join(dt2, by = colnames(.))
  X Y  Z T
1 1 2  4 2
2 7 5 NA 3

Data

dt1 <- read.table(text = "X Y  Z T
  1 2  4 2
  3 2  1 4
  7 5 NA 3",
                  header = TRUE, stringsAsFactors = FALSE)

dt2 <- read.table(text = "  X Y Z T
  1 2 4 2
                  3 2 NA 4
                  7 5 NA 3",
                  header = TRUE, stringsAsFactors = FALSE)

Upvotes: 3

talat
talat

Reputation: 70266

You can use dplyr's intersect function:

library(dplyr)
intersect(d1, d2)
#  X Y  Z T
#1 1 2  4 2
#2 7 5 NA 3

This is a data.frame-equivalent of base R's intersect function.

In case you're working with data.tables, that package also provides such a function:

library(data.table)
setDT(d1)
setDT(d2)
fintersect(d1, d2)
#   X Y  Z T
#1: 1 2  4 2
#2: 7 5 NA 3

Upvotes: 4

akrun
akrun

Reputation: 887118

One option with base R would be to paste the rows of each dataset together and compare (==) to create a logical vector which we use for subsetting the new dataset

dfO[do.call(paste, dfO) == do.call(paste, df),]
#   X Y  Z T
#1 1 2  4 2
#3 7 5 NA 3

where 'dfO' is the old dataset and 'df' is the new

Upvotes: 4

Related Questions