Phil
Phil

Reputation: 8107

How to tell grep to ignore any dollar signs when function is used within sapply?

I have the following 2 vectors:

vec1 <- c("Less than high school", "High school", "Junior college", "Bachelor", "Graduate")
vec2 <- c("High school (n=998)", "Bachelor (n=359)", "Junior college (n=141)", "Graduate (n=211)", "Less than high school (n=211)")

And I have the following command that lists the order of vec1 in the vec2 vector.

(index <- sapply(vec1, grep, x = vec2))
Less than high school           High school        Junior college              Bachelor 
                5                     1                     3                     2 
         Graduate 
                4

So far so good. However, things break if a dollar sign is in the text of the vectors:

vec1 <- c("Up to $27,600", "$27,600 to $54,900", "$54,900 to $82,100", "$82,100 to $109,000", "$109,000 to $137,000", "$137,000 and higher")
vec2 <- c("$27,600 to $54,900 (n=683)", "$109,000 to $137,000 (n=61)", "$54,900 to $82,100 (n=393)", "$137,000 and higher (n=164)", "$82,100 to $109,000 (n=225)", "Up to $27,600 (n=1070)")
(index <- sapply(vec1, grep, x = vec2))
$`Up to $27,600`
integer(0)

$`$27,600 to $54,900`
integer(0)

$`$54,900 to $82,100`
integer(0)

$`$82,100 to $109,000`
integer(0)

$`$109,000 to $137,000`
integer(0)

$`$137,000 and higher`
integer(0)

I understand that grep is reading the dollar sign as "end of the line", but how would I go about telling it to treat it just as text? I've tried adding escape slashes into the vectors, but that didn't work:

if (any(grepl("\\$", vec1))) {
   vec1 <- gsub("\\$", "\\\\$", vec1)
   vec2 <- gsub("\\$", "\\\\$", vec2)
}

Upvotes: 1

Views: 44

Answers (1)

lmo
lmo

Reputation: 38510

Finally, I can use pmatch in an answer. No sapply necessary:

First vectors:

pmatch(vec1, vec2)
[1] 5 1 3 2 4

Second vectors:

pmatch(vec1, vec2)
[1] 6 1 3 5 2 4

charmatch will also do the trick.

second vectors:

charmatch(vec1, vec2)
[1] 6 1 3 5 2 4

If adding the labels to the vector is important, use setNames:

setNames(charmatch(vec1, vec2), vec1)

The difference between these two functions, from ?pmatch:

charmatch is similar to pmatch with duplicates.ok true, the differences being that it differentiates between no match and an ambiguous partial match, it does match empty strings, and it does not allow multiple exact matches.

Upvotes: 4

Related Questions