Reputation: 8107
I have the following 2 vectors:
vec1 <- c("Less than high school", "High school", "Junior college", "Bachelor", "Graduate")
vec2 <- c("High school (n=998)", "Bachelor (n=359)", "Junior college (n=141)", "Graduate (n=211)", "Less than high school (n=211)")
And I have the following command that lists the order of vec1
in the vec2
vector.
(index <- sapply(vec1, grep, x = vec2))
Less than high school High school Junior college Bachelor
5 1 3 2
Graduate
4
So far so good. However, things break if a dollar sign is in the text of the vectors:
vec1 <- c("Up to $27,600", "$27,600 to $54,900", "$54,900 to $82,100", "$82,100 to $109,000", "$109,000 to $137,000", "$137,000 and higher")
vec2 <- c("$27,600 to $54,900 (n=683)", "$109,000 to $137,000 (n=61)", "$54,900 to $82,100 (n=393)", "$137,000 and higher (n=164)", "$82,100 to $109,000 (n=225)", "Up to $27,600 (n=1070)")
(index <- sapply(vec1, grep, x = vec2))
$`Up to $27,600`
integer(0)
$`$27,600 to $54,900`
integer(0)
$`$54,900 to $82,100`
integer(0)
$`$82,100 to $109,000`
integer(0)
$`$109,000 to $137,000`
integer(0)
$`$137,000 and higher`
integer(0)
I understand that grep
is reading the dollar sign as "end of the line", but how would I go about telling it to treat it just as text? I've tried adding escape slashes into the vectors, but that didn't work:
if (any(grepl("\\$", vec1))) {
vec1 <- gsub("\\$", "\\\\$", vec1)
vec2 <- gsub("\\$", "\\\\$", vec2)
}
Upvotes: 1
Views: 44
Reputation: 38510
Finally, I can use pmatch
in an answer. No sapply
necessary:
First vectors:
pmatch(vec1, vec2)
[1] 5 1 3 2 4
Second vectors:
pmatch(vec1, vec2)
[1] 6 1 3 5 2 4
charmatch
will also do the trick.
second vectors:
charmatch(vec1, vec2)
[1] 6 1 3 5 2 4
If adding the labels to the vector is important, use setNames
:
setNames(charmatch(vec1, vec2), vec1)
The difference between these two functions, from ?pmatch
:
charmatch is similar to pmatch with duplicates.ok true, the differences being that it differentiates between no match and an ambiguous partial match, it does match empty strings, and it does not allow multiple exact matches.
Upvotes: 4