Reputation: 135
i get the substring of word in the following way:
word="xyz9874"
pattern="[0-9]+"
x=gregexpr(pattern,word)
substr(word,start=x[[1]],stop=x[[1]]+attr(x[[1]],"match.length")-1)
[1] "9874"
Is there a more simple way to get the result in R?
Upvotes: 1
Views: 569
Reputation: 17090
Sure, use gsub
and backreferencing:
gsub( ".*?([0-9]+).*", "\\1", word )
Explanation: in most regex implementations, \1
is the back reference to the first subpattern matched. The subpattern is enclosed in parentheses. In R, you need to escape the backslash irrespective of the type of quotation marks you are using.
The question mark, an idiom of the "extended" regular expressions means that the given regex pattern should not be greedy, in other words -- it should take as little of the string as possible. Othrewise, the .*
in the pattern .*([0-9]+)
would match xyz987
and ([0-9]+)
would match 4
. Alternatively, we can write
gsub( ".*[^0-9]+([0-9]+).*", "\\1", word )
but then we have a problem with strings that start with a number.
By the way, note that instead of [0-9]
you can write \d
, or, actually, \\d
:
gsub( ".*?(\\d+).*", "\\1", word )
Upvotes: 3