Reputation: 2374
There are plenty of regex questions out there but I cannot solve the following in a elegant way.
I have the following vector and would like to extract only the numbers wihtin the square brackets, that is, excluding the brackets themselves. The numbers may be negative. The question might also be:
How to extract only the first capturing group with the function str_extract
from the {stringr}
package?
string <- c("[1] cate 1", "[-1] cate -1", "[2] cate 2")
stringr::str_extract(string = string, pattern = "\\[[^:digit:]+\\]")
[1] "[1]" "[-1]" "[2]"
stringr::str_extract(string = string, pattern = "\\[[^(:digit:)]+\\]")
[1] "[1]" "[-1]" "[2]"
I also tried to append \\1
to the pattern in order to extract the first group and got the following error:
stringr::str_extract(string = string, pattern = "\\[[^(?:digit:)]+\\]\\1")
Error in stri_extract_first_regex(string, pattern, opts_regex = opts(pattern)) :
Back-reference to a non-existent capture group. (U_REGEX_INVALID_BACK_REF)
I appreciate your time and apologize if this question is a duplicate.
Upvotes: 2
Views: 467
Reputation: 626929
You can use
stringr::str_extract(string, "(?<=\\[)-?\\d+(?=\\])")
See the R demo
If you need to match integer or float numbers, you can use
stringr::str_extract(string, "(?<=\\[)-?\\d*\\.?\\d+(?=\\])")
Details:
(?<=\[)
- a positive lookbehind that matches a location immediately preceded with [
-?
- an optional -
char\d+
- one or more digits\d*\.?\d+
- matches zero or more digits, an optional .
and then one or more digits(?=\])
- a positive lookahead that matches a location immediately followed with ]
.Upvotes: 2