Reputation: 83
I have two data frames as follows;
data1 <- as.data.frame(matrix(0, ncol = 2, nrow = 5))
data1$V1 <- c("AA", "BB", "CC", "DD", "EE")
data1$V2 <- c(1, 2, 3, 4, 5)
> data1
V1 V2
1 AA 1
2 BB 2
3 CC 3
4 DD 4
5 EE 5
data2 <- as.data.frame(matrix(0, ncol = 2, nrow = 9))
data2$V1 <- c("AA", "AA", "AA", "BB", "BB","BB","CC","CC","DD")
data2$V2 <- c("BB", "CC", "EE", "CC", "DD","EE","DD","EE","EE")
> data2
V1 V2
1 AA BB
2 AA CC
3 AA EE
4 BB CC
5 BB DD
6 BB EE
7 CC DD
8 CC EE
9 DD EE
I want to recode all values in "data2" dataframe based on the values of dataframe "data1'. Expected result is as follows;
> data3
V1 V2
1 1 2
2 1 3
3 1 5
4 2 3
5 2 4
6 2 5
7 3 4
8 3 5
9 4 5
In my real dataset "data1" dataframe would have 100,000 range of rows with unique values.
I found several examples with "plyr" package with "mapvalue()", "revalue()" functions, which I think can be used to recode values in single data frames. But couldn't figure out an efficient method to do what I want.
Upvotes: 1
Views: 448
Reputation: 887223
We can do this with base R
by creating a key/value vector from 'data1' to populate the values in 'data2'
data2[] <- setNames(data1$V2, data1$V1)[as.matrix(data2)]
data2
# V1 V2
#1 1 2
#2 1 3
#3 1 5
#4 2 3
#5 2 4
#6 2 5
#7 3 4
#8 3 5
#9 4 5
Or we can use dplyr
with match
library(dplyr)
data2 %>%
mutate_each(funs(data1$V2[match(., data1$V1)]))
# V1 V2
#1 1 2
#2 1 3
#3 1 5
#4 2 3
#5 2 4
#6 2 5
#7 3 4
#8 3 5
#9 4 5
Upvotes: 1