Recode data in two dataframes using R

Question

I have two data frames as follows;

data1 <- as.data.frame(matrix(0, ncol = 2, nrow = 5))
data1$V1 <- c("AA", "BB", "CC", "DD", "EE")
data1$V2 <- c(1, 2, 3, 4, 5)

> data1
   V1 V2
 1 AA  1
 2 BB  2
 3 CC  3
 4 DD  4
 5 EE  5

data2 <- as.data.frame(matrix(0, ncol = 2, nrow = 9))
data2$V1 <- c("AA", "AA", "AA", "BB", "BB","BB","CC","CC","DD")
data2$V2 <- c("BB", "CC", "EE", "CC", "DD","EE","DD","EE","EE")

> data2
  V1 V2
1 AA BB
2 AA CC
3 AA EE
4 BB CC
5 BB DD
6 BB EE
7 CC DD
8 CC EE
9 DD EE

I want to recode all values in "data2" dataframe based on the values of dataframe "data1'. Expected result is as follows;

In my real dataset "data1" dataframe would have 100,000 range of rows with unique values.

I found several examples with "plyr" package with "mapvalue()", "revalue()" functions, which I think can be used to recode values in single data frames. But couldn't figure out an efficient method to do what I want.

akrun · Accepted Answer

We can do this with base R by creating a key/value vector from 'data1' to populate the values in 'data2'

data2[] <- setNames(data1$V2, data1$V1)[as.matrix(data2)]
data2
#  V1 V2
#1  1  2
#2  1  3
#3  1  5
#4  2  3
#5  2  4
#6  2  5
#7  3  4
#8  3  5
#9  4  5

Or we can use dplyr with match

library(dplyr)
data2 %>% 
       mutate_each(funs(data1$V2[match(., data1$V1)]))
#   V1 V2
#1  1  2
#2  1  3
#3  1  5
#4  2  3
#5  2  4
#6  2  5
#7  3  4
#8  3  5
#9  4  5

Recode data in two dataframes using R

Answers (1)

Related Questions