janf
janf

Reputation: 81

Merging different data frames depending on column value

I have a data frame df1

df1<- data.frame(ID = c("A","B","A","A","B"),CLASS = c(1,1,2,1,4))
 ID CLASS
1  A     1
2  B     1
3  A     2
4  A     1
5  B     4

and another two data frames A and B

   > A<- data.frame(CLASS = c(1,2,3), DESCRIPTION = c("Unknown", "Tall", "Short"))
  CLASS DESCRIPTION
1     1     Unknown
2     2        Tall
3     3       Short

> B <- data.frame(CLASS = c(1,2,3,4), DESCRIPTION = c("Big", "Small", "Medium", "Very Big"))
  CLASS DESCRIPTION
1     1         Big
2     2       Small
3     3      Medium
4     4    Very Big

I want to merge these three data frames depending on the ID and class of df1 to have something like this:

      ID CLASS DESCRIPTION
1  A     1     Unknown
2  B     1         Big
3  A     2        Tall
4  A     1     Unknown
5  B     4    Very Big

I know I can merge it as df1 <- merge(df1, A, by = "CLASS") but I can't find a way to add the conditional (maybe an "if" is too much) to also merge B according to the ID. I need to have an efficient way to do this as I am applying it to over 2M rows.

Upvotes: 1

Views: 61

Answers (3)

moodymudskipper
moodymudskipper

Reputation: 47300

A dplyr solution:

library(dplyr)
bind_rows(lst(A,B),.id="ID") %>% inner_join(df1)
#   ID CLASS DESCRIPTION
# 1  A     1     Unknown
# 2  A     1     Unknown
# 3  A     2        Tall
# 4  B     1         Big
# 5  B     4    Very Big

Upvotes: 1

coffeinjunky
coffeinjunky

Reputation: 11514

To merge multiple dataframes in one go, Reduce is often helpful:

out <- Reduce(function(x,y) merge(x,y, by = "CLASS", all.x=T), list(df1, A, B))
out
  CLASS ID DESCRIPTION.x DESCRIPTION.y
1     1  A       Unknown           Big
2     1  B       Unknown           Big
3     1  A       Unknown           Big
4     2  A          Tall         Small
5     4  B          <NA>      Very Big

As you can see, columns that were present in all dataframes were added a suffix (default merge behavior). This allows you to apply whatever logic you want to get the final column you wish for. For instance,

out$Description <- ifelse(out$ID == "A", as.character(out$DESCRIPTION.x), as.character(out$DESCRIPTION.y))
> out
  CLASS ID DESCRIPTION.x DESCRIPTION.y Description
1     1  A       Unknown           Big     Unknown
2     1  B       Unknown           Big         Big
3     1  A       Unknown           Big     Unknown
4     2  A          Tall         Small        Tall
5     4  B          <NA>      Very Big    Very Big

Note that ifelse is vectorized and quite efficient.

Upvotes: 1

bouncyball
bouncyball

Reputation: 10761

Add the ID variable to A and B, rbind A and B together, and use ID and CLASS to merge:

A$ID = 'A'
B$ID = 'B'

AB <- rbind(A, B)

merge(df1, AB, by = c('ID', 'CLASS'))

  ID CLASS DESCRIPTION
1  A     1     Unknown
2  A     1     Unknown
3  A     2        Tall
4  B     1         Big
5  B     4    Very Big

I would suggest using stringsAsFactors = FALSE when creating the data:

df1 <- data.frame(ID = c("A","B","A","A","B"),CLASS = c(1,1,2,1,4),
                  stringsAsFactors = FALSE)
A <- data.frame(CLASS = c(1,2,3), 
                DESCRIPTION = c("Unknown", "Tall", "Short"),
                stringsAsFactors = FALSE)
B <- data.frame(CLASS = c(1,2,3,4), 
                DESCRIPTION = c("Big", "Small", "Medium", "Very Big"),
                stringsAsFactors = FALSE)

Upvotes: 2

Related Questions