Reputation: 23
I tried to am trying to run an if() conditional on someone being in the US senate ... but I get the wrong results, because I cannot match exactly in R. I tried word boundaries \b and beginning/end ^$, but it doesn't seem to work .... and do not know why?
> splits[[1]][4]
[1] "Ohio State Senate, 1979-1983"
> is.numeric(str_locate(splits[[1]][4], "\bSenator\b"))
[1] TRUE
> is.numeric(str_locate(splits[[1]][4], "/^Senator$/"))
[1] TRUE
> pattern <- "\bSenator\b"
> is.numeric(str_locate(splits[[1]][4], pattern))
[1] TRUE
Basically, the above should all yield false as my data only uses Senator if it is the US Senate, not a state senate.
Your help is greatly appreciated!
Thank you, Walter
Upvotes: 2
Views: 1942
Reputation: 6737
The help docs for str_locate
specify that it returns an integer matrix. Playing with the function a little, on a non match, it returns NA
.
You can test against NA:
> library(stringr)
> v <- "Ohio State Senate, 1979-1983"
> str_locate(v, "\\bSenator\\b")
start end
[1,] NA NA
> is.na(str_locate(v, "\\bSenator\\b")[,c('start')])
start
TRUE
> str_locate(v, "Senate")
start end
[1,] 12 17
> is.na(str_locate(v, "Senate")[,c('start')])
start
FALSE
Personally, I'd just use grep:
> grep("Senate",v)
integer(1)
> grep("Senator",v)
integer(0)
If you want to use word boundary matches you need to escape the slash: \\b
, not \b
.
Upvotes: 0
Reputation: 15458
x<-"Ohio State Senate, 1979-1983"
kk<-unlist(strsplit(x," "))
kk %in% state.name
[1] TRUE FALSE FALSE FALSE
OR,
is.numeric(str_locate(x, state.name)) #If this is true, then the senator is state senator
Upvotes: 1
Reputation: 59990
The function works as expected, you are just surprised by the return type. If it doesn't find a match then NA
is returned. More specifically, an NA_integer_
is returned (which take the maximum negative value for an integer of -2147483648).
x <- "Ohio State Senate, 1979-1983"
str_locate( x , "\bSenator\b")
# start end
#[1,] NA NA
#[2,] NA NA
And an NA_integer_
is actually a numeric...
is.numeric( NA_integer_ )
#[1] TRUE
So it all works fine. Try !all( is.na( str_locate( x , "\bSenator\b") ) )
instead.
Upvotes: 1