Reputation: 311
I have a 2 dfs, one with a column having multiple values eg
A B
10 400, 500, 600
20 700, 800, 900
C D
10 500
20 900
Am I able to use the merge function to merge the two tables using values in D matching any value in B??
Many thanks.
Upvotes: 0
Views: 234
Reputation: 50738
I'm not entirely sure on what you'd like to do; perhaps you can edit your question to include your expected outcome. Is this what you're after?
require(tidyverse);
df1 %>%
separate(B, into = paste0("_", 1:3), sep = ", ") %>%
gather(key, val, 2:4) %>%
rename(B = val) %>%
select(A, B) %>%
mutate(B = as.numeric(B)) %>%
full_join(df2, by = c("B" = "D"));
# A B C
#1 10 400 NA
#2 20 700 NA
#3 10 500 10
#4 20 800 NA
#5 10 600 NA
#6 20 900 20
Explanation: Split entries in df1$B
into different columns, convert data from wide into long format, then do a full outer join by matching entries df1$B
with entries df2$D
.
Or with an inner join
require(tidyverse);
df1 %>%
separate(B, into = paste0("_", 1:3), sep = ", ") %>%
gather(key, val, 2:4) %>%
rename(B = val) %>%
select(A, B) %>%
mutate(B = as.numeric(B)) %>%
inner_join(df2, by = c("B" = "D"));
# A B C
#1 10 500 10
#2 20 900 20
df1 <- read.table(text =
"A B
10 '400, 500, 600'
20 '700, 800, 900'", header = T);
df2 <- read.table(text =
"C D
10 500
20 900", header = T)
Upvotes: 1
Reputation: 1877
I'm not sure either what your question actually is. I assume you want something like merge(df1, df2, "B")
where in your second data set (C
,D
, D
was supposed to be B
). Anyways, I assume you want to "fuzzy" match D
with B
(i.e. is there any value in B
that is D
). You can use match
and strsplit
for that:
## The data
df1 <- data.frame(A = c(10,20), B = c("400, 500, 600", "700, 800, 900"),)
df2 <- data.frame(C = c(10,20), D = c(500, 900))
## Select the matching elements between df1$B and df2$D
matching <- mapply(function(x,y) any(x %in% y), df2$D, strsplit(as.character(df1$B), split = ", "))
## Combining the data frames
cbind(df1[matching], df2[matching])
# A B C D
#1 10 400, 500, 600 10 500
#2 20 700, 800, 900 20 900
## Combining the data frames without the B column (results similar to merge(df1, df2, "B") if df2 also had a "B" column )
cbind(df1[matching, 1], df2[matching])
# df1[matching, 1] C D
#1 10 10 500
#2 20 20 900
Upvotes: 0