Jack
Jack

Reputation: 167

How to match data frames based on column and impose condition?

I have two data frames. Here is an example:

x <- rep(c(0,1),3)
y <- c(1999,2000,2001,2002,2001,2002)
z <- data.frame(x,y)
x1 <- rep(0,12)
y1 <- c(1998,1999,1999,2000,1999,2001,1999,2000,2001,2002,2001,2002)
z1 <- data.frame(x1, y1)

Basically, newdf is calculated in the way that we pick two rows (row 1 and 2) of z and find match in frame z1 by year.Then we pick up the next two rows (row3 and row 4) of z and again find match in z1 (by year). Function merge will execute all possible combinations of matches however, I would like that each two sequential rows from z did not repeat twice.

newdf =(0  1999
        0  2000
        0  2001
        0  2002
        0  2001
        0  2002)

Any suggestion would be precious.

Upvotes: 1

Views: 1070

Answers (2)

makarand kulkarni
makarand kulkarni

Reputation: 301

plyr::join might help in this. just rename the y & y1 columns in z & z1 by common name lets say as "years" and use

abc=plyr::join(z,z1,by="years",match="first",type="left")

if you have specific two rows requirement then need to run in loop.

Upvotes: 1

BrodieG
BrodieG

Reputation: 52647

Assuming what you're trying to do is grab two rows from z, match them to z1 finding the first eligible match for each, and then remove the already matched rows from both z and z1, here is a solution:

new.df <- data.frame(x=integer(), y=integer())
while(nrow(z) > 0) {
  match.1 <- match(z$y[1], z1$y1)
  match.2 <- match(z$y[2], z1$y1)
  new.df <- rbind(new.df, z1[match.1, ], z1[match.2, ])
  z <- z[-(1:2), ]
  z1 <- z1[-c(match.1, match.2), ]
}
row.names(new.df) <- NULL
new.df
#    x1   y1
#  1  0 1999
#  2  0 2000
#  3  0 2001
#  4  0 2002
#  5  0 2001
#  6  0 2002

This matches your desired output, but your desired output is super ambiguous because all the x1 values are 0. It would be a lot easier if your first column in z1 had more distinguishing values to help infer what you want.

Also, this will break if there are values in z that are not in z1, or if z doesn't have an even number of rows, but I'll leave adding the logic to fix it to you. Additionally, if you're going to do this for large zs you will need to pre-size new.df and replace by index instead of rbind as I have done here as that gets slow.

Upvotes: 1

Related Questions