Walter Mair
Walter Mair

Reputation: 23

Exact match with str_locate regex in R

I tried to am trying to run an if() conditional on someone being in the US senate ... but I get the wrong results, because I cannot match exactly in R. I tried word boundaries \b and beginning/end ^$, but it doesn't seem to work .... and do not know why?

> splits[[1]][4]
[1] "Ohio State Senate, 1979-1983"
> is.numeric(str_locate(splits[[1]][4], "\bSenator\b"))
[1] TRUE
> is.numeric(str_locate(splits[[1]][4], "/^Senator$/"))
[1] TRUE
> pattern <- "\bSenator\b"
> is.numeric(str_locate(splits[[1]][4], pattern))
[1] TRUE

Basically, the above should all yield false as my data only uses Senator if it is the US Senate, not a state senate.

Your help is greatly appreciated!

Thank you, Walter

Upvotes: 2

Views: 1942

Answers (3)

dsummersl
dsummersl

Reputation: 6737

The help docs for str_locate specify that it returns an integer matrix. Playing with the function a little, on a non match, it returns NA.

You can test against NA:

> library(stringr)
> v <- "Ohio State Senate, 1979-1983"

> str_locate(v, "\\bSenator\\b")
start end
[1,]    NA  NA
> is.na(str_locate(v, "\\bSenator\\b")[,c('start')])
start
TRUE

> str_locate(v, "Senate")
start end
[1,]    12  17
> is.na(str_locate(v, "Senate")[,c('start')])
start
FALSE

Personally, I'd just use grep:

> grep("Senate",v)
integer(1)
> grep("Senator",v)
integer(0)

If you want to use word boundary matches you need to escape the slash: \\b, not \b.

Upvotes: 0

Metrics
Metrics

Reputation: 15458

x<-"Ohio State Senate, 1979-1983"
kk<-unlist(strsplit(x," "))
kk %in% state.name
[1]  TRUE FALSE FALSE FALSE

OR,

is.numeric(str_locate(x, state.name)) #If this is true, then the senator is state senator

Upvotes: 1

Simon O&#39;Hanlon
Simon O&#39;Hanlon

Reputation: 59990

The function works as expected, you are just surprised by the return type. If it doesn't find a match then NA is returned. More specifically, an NA_integer_ is returned (which take the maximum negative value for an integer of -2147483648).

x <- "Ohio State Senate, 1979-1983"
str_locate( x , "\bSenator\b")
#     start end
#[1,]    NA  NA
#[2,]    NA  NA

And an NA_integer_ is actually a numeric...

is.numeric( NA_integer_ )
#[1] TRUE

So it all works fine. Try !all( is.na( str_locate( x , "\bSenator\b") ) ) instead.

Upvotes: 1

Related Questions