Reputation: 81
I have a data frame df1
df1<- data.frame(ID = c("A","B","A","A","B"),CLASS = c(1,1,2,1,4))
ID CLASS
1 A 1
2 B 1
3 A 2
4 A 1
5 B 4
and another two data frames A
and B
> A<- data.frame(CLASS = c(1,2,3), DESCRIPTION = c("Unknown", "Tall", "Short"))
CLASS DESCRIPTION
1 1 Unknown
2 2 Tall
3 3 Short
> B <- data.frame(CLASS = c(1,2,3,4), DESCRIPTION = c("Big", "Small", "Medium", "Very Big"))
CLASS DESCRIPTION
1 1 Big
2 2 Small
3 3 Medium
4 4 Very Big
I want to merge these three data frames depending on the ID and class of df1
to have something like this:
ID CLASS DESCRIPTION
1 A 1 Unknown
2 B 1 Big
3 A 2 Tall
4 A 1 Unknown
5 B 4 Very Big
I know I can merge it as df1 <- merge(df1, A, by = "CLASS")
but I can't find a way to add the conditional (maybe an "if" is too much) to also merge B according to the ID.
I need to have an efficient way to do this as I am applying it to over 2M rows.
Upvotes: 1
Views: 61
Reputation: 47300
A dplyr
solution:
library(dplyr)
bind_rows(lst(A,B),.id="ID") %>% inner_join(df1)
# ID CLASS DESCRIPTION
# 1 A 1 Unknown
# 2 A 1 Unknown
# 3 A 2 Tall
# 4 B 1 Big
# 5 B 4 Very Big
Upvotes: 1
Reputation: 11514
To merge multiple dataframes in one go, Reduce
is often helpful:
out <- Reduce(function(x,y) merge(x,y, by = "CLASS", all.x=T), list(df1, A, B))
out
CLASS ID DESCRIPTION.x DESCRIPTION.y
1 1 A Unknown Big
2 1 B Unknown Big
3 1 A Unknown Big
4 2 A Tall Small
5 4 B <NA> Very Big
As you can see, columns that were present in all dataframes were added a suffix (default merge
behavior). This allows you to apply whatever logic you want to get the final column you wish for. For instance,
out$Description <- ifelse(out$ID == "A", as.character(out$DESCRIPTION.x), as.character(out$DESCRIPTION.y))
> out
CLASS ID DESCRIPTION.x DESCRIPTION.y Description
1 1 A Unknown Big Unknown
2 1 B Unknown Big Big
3 1 A Unknown Big Unknown
4 2 A Tall Small Tall
5 4 B <NA> Very Big Very Big
Note that ifelse
is vectorized and quite efficient.
Upvotes: 1
Reputation: 10761
Add the ID
variable to A
and B
, rbind
A
and B
together, and use ID
and CLASS
to merge
:
A$ID = 'A'
B$ID = 'B'
AB <- rbind(A, B)
merge(df1, AB, by = c('ID', 'CLASS'))
ID CLASS DESCRIPTION
1 A 1 Unknown
2 A 1 Unknown
3 A 2 Tall
4 B 1 Big
5 B 4 Very Big
I would suggest using stringsAsFactors = FALSE
when creating the data:
df1 <- data.frame(ID = c("A","B","A","A","B"),CLASS = c(1,1,2,1,4),
stringsAsFactors = FALSE)
A <- data.frame(CLASS = c(1,2,3),
DESCRIPTION = c("Unknown", "Tall", "Short"),
stringsAsFactors = FALSE)
B <- data.frame(CLASS = c(1,2,3,4),
DESCRIPTION = c("Big", "Small", "Medium", "Very Big"),
stringsAsFactors = FALSE)
Upvotes: 2