Reputation: 1
I am attempting to link two datasets using fastLink. I have manually found matches between some cases that fastLink failed to pair, and I am trying to understand why this may be. To test what's going on, I ran the following simple match:
library(fastLink)
a <- data.frame("x"=1:40,
"date"=c(3603, 3603, 4115, 3805, 4448, 4448, 4758, 4758, 4758, 4623, 4623, 4623, 4623, 4623, 4623, 4623, 4623, 4667, 4667, 4840,
4840, 4840, 4693, 4623, 6247, 5715, 6227, 6227, 5988, 6331, 6331, 5988, 6268, 6268, 6268, 6268, 6275, 6275, 5829, 6275))
b <- data.frame("x"=1:82,
"date"=c(3042, 3104, 3302, 3330, 3342, 3407, 3713, 3882, 3882, 4043, 4249, 4175, 4184, 4184, 4184, 4366, 4239, 4117, 4127, 4127,
4239, 4498, 5094, 4848, 4975, 5148, 5185, 5213, 5309, 5521, 5604, 5604, 5604, 5604, 5897, 5976, 5976, 6002, 6058, 6102,
6158, 6184, 6184, 6184, 6184, 6255, 6256, 6256, 6275, 6275, 6284, 6284, 6284, 6303, 6303, 6312, 6332, 6340, 6352, 6352,
6366, 6366, 6366, 6366, 6366, 6366, 6367, 6375, 6375, 6396, 6403, 6407, 6443, 6443, 6443, 6443, 6494, 6494, 6494, 6494,
6494, 6494))
out <- fastLink(a,
b,
varnames="date",
numeric.match="date",
threshold.match=.01)
key_tab <- data.frame("inds.a"=out$matches$inds.a,
"inds.b"=out$matches$inds.b,
"a_date"=unlist(a[out$matches$inds.a, "date"]),
"b_date"=unlist(b[out$matches$inds.b, "date"]),
"posterior"=out$posterior)
key_tab
inds.a inds.b a_date b_date posterior
1 37 49 6275 6275 0.7025187055498182
2 38 50 6275 6275 0.7025187055498182
3 40 50 6275 6275 0.7025187055498182
4 30 57 6331 6332 0.7025187055498182
5 31 57 6331 6332 0.7025187055498182
So, fastLink returned matches for 5 of the rows in a
: three matched to exactly the same value in b
, and two matched to a value one away, and these are assigned the same posterior. I understand this is because by default, cut.a.num = 1
, so matches one away are considered exact matches.
But why are no partial matches returned, e.g. with dates 2 or 3 away? The data contain such matches. Here are some things I tried:
cut.p.num=3
(or 2, or 5, or 100) changes nothingpartial.match="date"
(as suggested here`) causes there to be no matches returneddedupe.matches=FALSE
just returns more matches for these same 5 rowsreturn.all=TRUE
causes there to be no matches returned (why?)Thanks in advance, and apologies if I'm missing something obvious.
Upvotes: 0
Views: 16