Elizabeth
Elizabeth

Reputation: 1

How can I use fastLink in R to get partial numeric matches?

I am attempting to link two datasets using fastLink. I have manually found matches between some cases that fastLink failed to pair, and I am trying to understand why this may be. To test what's going on, I ran the following simple match:

library(fastLink) 

a <- data.frame("x"=1:40,
                "date"=c(3603, 3603, 4115, 3805, 4448, 4448, 4758, 4758, 4758, 4623, 4623, 4623, 4623, 4623, 4623, 4623, 4623, 4667, 4667, 4840,
                        4840, 4840, 4693, 4623, 6247, 5715, 6227, 6227, 5988, 6331, 6331, 5988, 6268, 6268, 6268, 6268, 6275, 6275, 5829, 6275))

b <- data.frame("x"=1:82,
                "date"=c(3042, 3104, 3302, 3330, 3342, 3407, 3713, 3882, 3882, 4043, 4249, 4175, 4184, 4184, 4184, 4366, 4239, 4117, 4127, 4127,
                         4239, 4498, 5094, 4848, 4975, 5148, 5185, 5213, 5309, 5521, 5604, 5604, 5604, 5604, 5897, 5976, 5976, 6002, 6058, 6102,
                         6158, 6184, 6184, 6184, 6184, 6255, 6256, 6256, 6275, 6275, 6284, 6284, 6284, 6303, 6303, 6312, 6332, 6340, 6352, 6352,
                         6366, 6366, 6366, 6366, 6366, 6366, 6367, 6375, 6375, 6396, 6403, 6407, 6443, 6443, 6443, 6443, 6494, 6494, 6494, 6494,
                         6494, 6494))

out <- fastLink(a, 
                b, 
                varnames="date", 
                numeric.match="date",
                threshold.match=.01)

key_tab <- data.frame("inds.a"=out$matches$inds.a, 
                      "inds.b"=out$matches$inds.b, 
                      "a_date"=unlist(a[out$matches$inds.a, "date"]),
                      "b_date"=unlist(b[out$matches$inds.b, "date"]),
                      "posterior"=out$posterior)

key_tab
  inds.a inds.b a_date b_date          posterior
1     37     49   6275   6275 0.7025187055498182
2     38     50   6275   6275 0.7025187055498182
3     40     50   6275   6275 0.7025187055498182
4     30     57   6331   6332 0.7025187055498182
5     31     57   6331   6332 0.7025187055498182

So, fastLink returned matches for 5 of the rows in a: three matched to exactly the same value in b, and two matched to a value one away, and these are assigned the same posterior. I understand this is because by default, cut.a.num = 1, so matches one away are considered exact matches.

But why are no partial matches returned, e.g. with dates 2 or 3 away? The data contain such matches. Here are some things I tried:

Thanks in advance, and apologies if I'm missing something obvious.

Upvotes: 0

Views: 16

Answers (0)

Related Questions