Reputation:
I have two datasets I would like to merge based on a partial match of an ID that is a string of ID's separated by commas. So my first dataset has a ID field that is basic (one string) but the second has an ID string that includes multiple different IDs separated by a comma. For example:
TIME GROUPID y1 y2
1 A 0 1
1 B 1 1
2 C 1 0
2 D 0 0
3 E 1 0
TIME GROUPID x1
1 A,B 4
2 B,C 2
3 E 3
I would like to merge these two datasets based on TIME and GROUPID but for GROUPID I would like a partial match. So "A" would match "A" but it would also match "A,E" or "B,A". The result:
TIME GROUPID y1 y2 x1
1 A 0 1 4
1 B 1 1 4
2 C 1 0 2
2 D 0 0 NA
3 E 1 0 3
Thanks so much in advance!
Upvotes: 1
Views: 421
Reputation: 28441
Try cSplit
from splitstackshape
. With the argument direction="long"
we give the lookup data.frame a standard merging layout:
library(splitstackshape)
lkup <- cSplit(df2, "GROUPID", direction="long")
merge(df1, lkup, by=c("TIME", "GROUPID"), all.x=T)
# TIME GROUPID y1 y2 x1
# 1 1 A 0 1 4
# 2 1 B 1 1 4
# 3 2 C 1 0 2
# 4 2 D 0 0 NA
# 5 3 E 1 0 3
data
df1 <- structure(list(TIME = c(1L, 1L, 2L, 2L, 3L), GROUPID = structure(1:5, .Label = c("A",
"B", "C", "D", "E"), class = "factor"), y1 = c(0L, 1L, 1L, 0L,
1L), y2 = c(1L, 1L, 0L, 0L, 0L)), .Names = c("TIME", "GROUPID",
"y1", "y2"), class = "data.frame", row.names = c(NA, -5L))
df2 <- structure(list(TIME = 1:3, GROUPID = structure(1:3, .Label = c("A,B",
"B,C", "E"), class = "factor"), x1 = c(4L, 2L, 3L)), .Names = c("TIME",
"GROUPID", "x1"), class = "data.frame", row.names = c(NA, -3L
))
Upvotes: 4