simoncolumbus
simoncolumbus

Reputation: 566

R replace values in column based on match between columns

I have two dataframes, each with the same columns. Some columns have the same values in the same order in both dataframes (X1, X2 below). Other columns have the same values, but in a different order (Y1). This is only a problem for some levels of first variables (here, the order of rows in Y1 differs for X1 == "a", but not X1 == "b"). Example:

df1 <- data.frame("X1" = c("a", "a", "a", "b", "b", "b"),
                  "X2" = c("1", "2", "3", "1", "2", "3"),
                  "Y1" = c("d", "d", "f", "g", "h", "i"))
df2 <- data.frame("X1" = c("a", "a", "a", "b", "b", "b"),
                  "X2" = c("1", "2", "3", "1", "2", "3"),
                  "Y1" = c("f", "d", "d", "g", "h", "i"))

I would like to change the values of df2$X1 and df2$X2 such that the two dataframes are matched on values of Y1.

I would like to change X1 and X2 rather than Y1 because there are many Y variables. I would like to do this only for df$X1 == "a".

The output should looks like this:

df2 <- data.frame("X1" = c("a", "a", "a", "b", "b", "b"),
                  "X2" = c("3", "1", "2", "1", "2", "3"),
                  "Y1" = c("f", "d", "d", "g", "h", "i"))

Upvotes: 1

Views: 972

Answers (1)

prosoitos
prosoitos

Reputation: 7327

What is a little tricky in your situation is that you have duplicates in the Y1 columns which correspond to different values in the X2 columns. So you will have to make these unique.

First, make sure that your Y1 columns are character vectors and not factors:

df1 <- data.frame("X1" = c("a", "a", "a", "b", "b", "b"),
                  "X2" = c("1", "2", "3", "1", "2", "3"),
                  "Y1" = c("d", "d", "f", "g", "h", "i"),
                  stringsAsFactors = F)

df2 <- data.frame("X1" = c("a", "a", "a", "b", "b", "b"),
                  "X2" = c("1", "2", "3", "1", "2", "3"),
                  "Y1" = c("f", "d", "d", "g", "h", "i"),
                  stringsAsFactors = F)

Give unique names to your Y1 duplicates:

df1$Y1uniq <- make.unique(df1$Y1)

df2$Y1uniq <- make.unique(df2$Y1)

Then you can use match() using those uniques values (and remove that column once you don't need it anymore):

df1[match(df2$Y1uniq, df1$Y1uniq), ][ , 1:3]

Output:

  X1 X2 Y1
3  a  3  f
1  a  1  d
2  a  2  d
4  b  1  g
5  b  2  h
6  b  3  i

Upvotes: 1

Related Questions