Marcelo Avila
Marcelo Avila

Reputation: 2374

How to extract number within but excluding brackets with str_extract() from package stringr?

There are plenty of regex questions out there but I cannot solve the following in a elegant way.

I have the following vector and would like to extract only the numbers wihtin the square brackets, that is, excluding the brackets themselves. The numbers may be negative. The question might also be:

How to extract only the first capturing group with the function str_extract from the {stringr} package?

string <- c("[1] cate 1", "[-1] cate -1", "[2] cate 2")
stringr::str_extract(string = string, pattern =  "\\[[^:digit:]+\\]")

[1] "[1]"  "[-1]" "[2]" 

stringr::str_extract(string = string, pattern =  "\\[[^(:digit:)]+\\]")

[1] "[1]"  "[-1]" "[2]" 

I also tried to append \\1 to the pattern in order to extract the first group and got the following error:

stringr::str_extract(string = string, pattern =  "\\[[^(?:digit:)]+\\]\\1")

Error in stri_extract_first_regex(string, pattern, opts_regex = opts(pattern)) : 
  Back-reference to a non-existent capture group. (U_REGEX_INVALID_BACK_REF)

I appreciate your time and apologize if this question is a duplicate.

Upvotes: 2

Views: 467

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626929

You can use

stringr::str_extract(string, "(?<=\\[)-?\\d+(?=\\])")

See the R demo

If you need to match integer or float numbers, you can use

stringr::str_extract(string, "(?<=\\[)-?\\d*\\.?\\d+(?=\\])")

Details:

  • (?<=\[) - a positive lookbehind that matches a location immediately preceded with [
  • -? - an optional - char
  • \d+ - one or more digits
  • \d*\.?\d+ - matches zero or more digits, an optional . and then one or more digits
  • (?=\]) - a positive lookahead that matches a location immediately followed with ].

Upvotes: 2

Related Questions