Reputation: 41

R - Understanding output of regexpr

Can you pls help in understanding output of regexpr? I am interested in text position that is 10 below. But it shows two values that is 10 and 4. How do I capture number 10 only.

Is this output a vector of numbers?

text<-"World is beautiful"
out<-regexpr("beau",text)
out
#[1] 10
#attr(,"match.length")
#[1] 4
#attr(,"useBytes")
#[1] TRUE
out[1]
#[1] 10
out[2]
#[1] NA

Upvotes: 4

Answers (2)

Simon O'Hanlon

Reputation: 60000

out is a length 1L atomic vector with attributes...

str(out)
 atomic [1:1] 10
 - attr(*, "match.length")= int 4
 - attr(*, "useBytes")= logi TRUE

The value of out (try c(out) to drop the attributes) is 10 which describes the start position in the character vector for a match to your pattern. attr( out , "match.length") is [1] 4 which describes the length of the match.

Your text string is one element long, hence out is one element long. Try regexpr("beau",rep(text,3)).

Upvotes: 2

CHP

Reputation: 17189

From the help page of regexpr. You can get it by typing ?regexpr in R console.

regexpr returns an integer vector of the same length as text giving the starting position of the first match or -1 if there is none, with attribute "match.length", an integer vector giving the length of the matched text (or -1 for no match). The match positions and lengths are in characters unless useBytes = TRUE is used, when they are in bytes. If named capture is used there are further attributes "capture.start", "capture.length" and "capture.names".

Upvotes: 0

R - Understanding output of regexpr

Answers (2)

Related Questions