Reputation: 1472
let's create example data:
df <- data.frame(date=c("2017-01-01","2017-01-02", "2017-01-03", "2017-01-04", "2017-01-05"), X1=c("A", "B", "C", "D", "F"),
X2=c("B", "A", "D", "F", "C"))
df2 <- data.frame(date=c("2017-01-01","2017-01-02", "2017-01-03", "2017-01-04", "2017-01-05"),
A=c("3", "4", "2", "1", "5"),
B=c("6", "2", "5", "1", "1"),
C=c("1", "4", "5", "2", "3"),
D=c("67", "67", "63", "61", "62"),
F=c("31", "33", "35", "31", "38"))
So I have two data frames and I want to match values from df2 to df by date and X1 and X2 and create new variables for those. What makes this tricky for me is that matched values in df2 are in colnames. End result should look like this:
> result
date X1 X2 Var1 Var2
1 2017-01-01 A B 3 6
2 2017-01-02 B A 2 4
3 2017-01-03 C D 5 63
4 2017-01-04 D F 61 31
5 2017-01-05 F C 38 3
result <- data.frame(date=c("2017-01-01","2017-01-02", "2017-01-03", "2017-01-04", "2017-01-05"),
X1=c("A", "B", "C", "D", "F"),
X2=c("B", "A", "D", "F", "C"),
Var1=c("3", "2", "5", "61", "38"),
Var2=c("6", "4", "63", "31", "3"))
I wanted to use mapvalues, but couldn't figure it out. Second thought was to go long format (melt) with df2 and try then, but failed there as well.
Ok, here is my best try, just feels that there could be more efficient way, if you have to create multiple (>50) new variables to data frame.
df2.long <- melt(df2, id.vars = c("date"))
df$Var1 <- na.omit(merge(df, df2.long, by.x = c("date", "X1"), by.y = c("date", "variable"), all.x = FALSE, all.y = TRUE))[,4]
df$Var2 <- na.omit(merge(df, df2.long, by.x = c("date", "X2"), by.y = c("date", "variable"), all.x = FALSE, all.y = TRUE))[,5]
Upvotes: 6
Views: 8422
Reputation: 13680
Using dplyr
and tidyr
:
df2_m <- group_by(df2, date) %>%
gather('X1', 'var', -date)
left_join(df, df2_m) %>%
left_join(df2_m, by = c('date', 'X2' = 'X1')) %>%
rename(Var1 = var.x, Var2 = var.y) -> result
Upvotes: 4
Reputation: 24074
A possibility with mapply
:
df$Var1 <- mapply(function(day, col) df2[df2$date==day, as.character(col)],
day=df$date, col=df$X1)
df$Var2 <- mapply(function(day, col) df2[df2$date==day, as.character(col)],
day=df$date, col=df$X2)
df
# date X1 X2 Var1 Var2
#1 2017-01-01 A B 3 6
#2 2017-01-02 B A 2 4
#3 2017-01-03 C D 5 63
#4 2017-01-04 D F 61 31
#5 2017-01-05 F C 38 3
NB:
If you have more columns to modify (not just 2 like in your example), you can use lapply
to loop over the columns X.
:
df[, paste0("Var", 1:2)] <- lapply(df[,paste0("X", 1:2)],
function(value) {
mapply(function(day, col) df2[df2$date==day, as.character(col)],
day=df$date, col=value)})
Upvotes: 3
Reputation: 1576
Using melt and match:
df2l<-melt(df2, measure=c("A","B","C","D","F"))
Indices <- match(paste(df$date, df$X1), paste(df2l$date,df2l$variable))
df$Var1 <- df2l$value[Indices]
Indices2 <- match(paste(df$date, df$X2), paste(df2l$date,df2l$variable))
df$Var2 <- df2l$value[Indices2]
Upvotes: 1
Reputation: 92282
An double melt > join > dcast option using data.table
library(data.table) # v>=1.10.0
dcast(
melt(setDT(df), 1L)[ # melt the first table by date
melt(setDT(df2), 1L), # melt the second table by date
on = .(date, value = variable), # join by date and the letters
nomatch = 0L], # remove everything that wasn't matched
date ~ variable, # convert back to long format
value.var = c("value", "i.value")) # take both values columns
# date value_X1 value_X2 i.value_X1 i.value_X2
# 1: 2017-01-01 A B 3 6
# 2: 2017-01-02 B A 2 4
# 3: 2017-01-03 C D 5 63
# 4: 2017-01-04 D F 61 31
# 5: 2017-01-05 F C 38 3
Upvotes: 3
Reputation: 887028
We can use match
to get the column index of 'df2' from the 'X1' and 'X2' columns, cbind
with the sequence of rows, use the row/column index to extract the values in 'df2', and assign the output to create the 'Var' columns
df[paste0("Var", 1:2)] <- lapply(df[2:3], function(x)
df2[-1][cbind(1:nrow(df2), match(x, names(df2)[-1]))])
df
# date X1 X2 Var1 Var2
#1 2017-01-01 A B 3 6
#2 2017-01-02 B A 2 4
#3 2017-01-03 C D 5 63
#4 2017-01-04 D F 61 31
#5 2017-01-05 F C 38 3
Upvotes: 2