Mayank Bansal
Mayank Bansal

Reputation: 1125

unexpected behavior in pmatch while matching '+' in R

I am trying to match the '+' symbol inside my string using the pmatch function.

Target = "18+"

pmatch("+",Target)

[1] NA

I observe similar behavior if I use match or grepl also. If I try and use gsub, I get the following output.

gsub("+","~",Target)

[1] "~1~8~+~"

Can someone please explain me the reason for this behavior and a viable solution for my problem

Upvotes: 3

Views: 847

Answers (3)

Jean V. Adams
Jean V. Adams

Reputation: 4784

The function pmatch() attempts to match the beginning elements, not the middle portions of elements. So, the issue there has nothing to do with the plus symbol, +. So, for example, the first two executions of pmatch() give NA as the result, the next three give 1 as the result (indicating a match of the beginning of the first element).

Target <- "18+"
pmatch("8", Target)
pmatch("+", Target)
pmatch("1", Target)
pmatch("18", Target)
pmatch("18+", Target)

The function gsub() can be used to match and replace portions of elements using regular expressions. The plus sign has special meaning in regular expressions, so you need to use escape characters to indicate that you are interested in the plus sign as a single character. For example, the following three lines of code give "1~+", "18~", and "~" as the results, respectively.

gsub("8", "~", Target)
gsub("\\+", "~", Target)
gsub("18\\+", "~", Target)

Upvotes: 1

Simon O&#39;Hanlon
Simon O&#39;Hanlon

Reputation: 59970

It's a forward looking match. So it tries to match "+" to the first character of all elements in table (the second argument of pmatch). This fails ("+" != "1" ) so NA is returned. You must also be careful of the return value of pmatch. I'm going to quote from the help because it explains it succinctly and better than I ever could...

Exact matches are preferred to partial matches (those where the value to be matched has an exact match to the initial part of the target, but the target is longer).

If there is a single exact match or no exact match and a unique partial match then the index of the matching value is returned; if multiple exact or multiple partial matches are found then 0 is returned and if no match is found then nomatch is returned.

###Examples from ?pmatch###
#  Multiple partial matches found - returns 0
charmatch("m",   c("mean", "median", "mode")) # returns 0

#  One exact match found - return index of match in table
charmatch("med", c("mean", "median", "mode")) # returns 2

#  One exact match found and preferred over partial match - index of exact match returned
charmatch("med", c("med", "median", "mode")) # returns 1

To get a vector of matches to "+" in your string I'd use grepl...

Target <- c( "+" , "+18" , "18+" , "23+26" , "1234" )
grepl( "\\+" , Target )
# [1]  TRUE  TRUE  TRUE  TRUE FALSE

Upvotes: 5

zx8754
zx8754

Reputation: 56149

Try this:

gsub("+","~",fixed=TRUE,Target)

?gsub

fixed - logical. If TRUE, pattern is a string to be matched as is. Overrides all conflicting arguments.

Upvotes: 1

Related Questions