zoe
zoe

Reputation: 311

merge using multiple values in column in R?

I have a 2 dfs, one with a column having multiple values eg

  A            B
 10     400, 500, 600
 20     700, 800, 900

 C         D
10        500
20        900

Am I able to use the merge function to merge the two tables using values in D matching any value in B??

Many thanks.

Upvotes: 0

Views: 234

Answers (2)

Maurits Evers
Maurits Evers

Reputation: 50738

I'm not entirely sure on what you'd like to do; perhaps you can edit your question to include your expected outcome. Is this what you're after?

require(tidyverse);
df1 %>%
    separate(B, into = paste0("_", 1:3), sep = ", ") %>%
    gather(key, val, 2:4) %>%
    rename(B = val) %>%
    select(A, B) %>%
    mutate(B = as.numeric(B)) %>%
    full_join(df2, by = c("B" = "D"));
#   A   B  C
#1 10 400 NA
#2 20 700 NA
#3 10 500 10
#4 20 800 NA
#5 10 600 NA
#6 20 900 20

Explanation: Split entries in df1$B into different columns, convert data from wide into long format, then do a full outer join by matching entries df1$B with entries df2$D.

Or with an inner join

require(tidyverse);
df1 %>%
    separate(B, into = paste0("_", 1:3), sep = ", ") %>%
    gather(key, val, 2:4) %>%
    rename(B = val) %>%
    select(A, B) %>%
    mutate(B = as.numeric(B)) %>%
    inner_join(df2, by = c("B" = "D"));
#   A   B  C
#1 10 500 10
#2 20 900 20

Sample data

df1 <- read.table(text =
    "A            B
 10     '400, 500, 600'
 20     '700, 800, 900'", header = T);

 df2 <- read.table(text =
    "C         D
10        500
20        900", header = T)

Upvotes: 1

Thomas Guillerme
Thomas Guillerme

Reputation: 1877

I'm not sure either what your question actually is. I assume you want something like merge(df1, df2, "B") where in your second data set (C,D, D was supposed to be B). Anyways, I assume you want to "fuzzy" match D with B (i.e. is there any value in B that is D). You can use match and strsplit for that:

## The data
df1 <- data.frame(A = c(10,20), B = c("400, 500, 600", "700, 800, 900"),)
df2 <- data.frame(C = c(10,20), D = c(500, 900))

## Select the matching elements between df1$B and df2$D
matching <- mapply(function(x,y) any(x %in% y), df2$D, strsplit(as.character(df1$B), split = ", "))

## Combining the data frames
cbind(df1[matching], df2[matching])
#   A             B  C   D
#1 10 400, 500, 600 10 500
#2 20 700, 800, 900 20 900

## Combining the data frames without the B column (results similar to merge(df1, df2, "B") if df2 also had a "B" column )
cbind(df1[matching, 1], df2[matching])
#  df1[matching, 1]  C   D
#1               10 10 500
#2               20 20 900

Upvotes: 0

Related Questions