How to match data frames based on column and impose condition?

Question

I have two data frames. Here is an example:

x <- rep(c(0,1),3)
y <- c(1999,2000,2001,2002,2001,2002)
z <- data.frame(x,y)
x1 <- rep(0,12)
y1 <- c(1998,1999,1999,2000,1999,2001,1999,2000,2001,2002,2001,2002)
z1 <- data.frame(x1, y1)

Basically, newdf is calculated in the way that we pick two rows (row 1 and 2) of z and find match in frame z1 by year.Then we pick up the next two rows (row3 and row 4) of z and again find match in z1 (by year). Function merge will execute all possible combinations of matches however, I would like that each two sequential rows from z did not repeat twice.

newdf =(0  1999
        0  2000
        0  2001
        0  2002
        0  2001
        0  2002)

Any suggestion would be precious.

BrodieG · Accepted Answer

Assuming what you're trying to do is grab two rows from z, match them to z1 finding the first eligible match for each, and then remove the already matched rows from both z and z1, here is a solution:

new.df <- data.frame(x=integer(), y=integer())
while(nrow(z) > 0) {
  match.1 <- match(z$y[1], z1$y1)
  match.2 <- match(z$y[2], z1$y1)
  new.df <- rbind(new.df, z1[match.1, ], z1[match.2, ])
  z <- z[-(1:2), ]
  z1 <- z1[-c(match.1, match.2), ]
}
row.names(new.df) <- NULL
new.df
#    x1   y1
#  1  0 1999
#  2  0 2000
#  3  0 2001
#  4  0 2002
#  5  0 2001
#  6  0 2002

This matches your desired output, but your desired output is super ambiguous because all the x1 values are 0. It would be a lot easier if your first column in z1 had more distinguishing values to help infer what you want.

Also, this will break if there are values in z that are not in z1, or if z doesn't have an even number of rows, but I'll leave adding the logic to fix it to you. Additionally, if you're going to do this for large zs you will need to pre-size new.df and replace by index instead of rbind as I have done here as that gets slow.

How to match data frames based on column and impose condition?

Answers (2)

Related Questions