Reputation: 1337
I know there are a few similar questions, but they did not help me, perhaps due to my lack of understanding the basics of string manipulation.
I have a piece of string that I want to extract the inside of its first square brackets.
x <- "cons/mod2/det[4]/rost2/rost_act[2]/Q2w5"
I have looked all over the internet to assemble the following code but it gives me inside of 2nd brackets
sub(".*\\[(.*)\\].*", "\\1", x, perl=TRUE)
The code returns 2. I expect to get 4.
Would appreciate if someone points out the missing piece.
---- update ----
Replacing .*
to .*?
in the first two instances worked, but do not know how. I leave the question open for someone who can provide why this works:
sub(".*?\\[(.*?)\\].*", "\\1", x, perl=TRUE)
Upvotes: 0
Views: 516
Reputation: 27732
You can solve this with base R, but I usually prefer the functions from the stringr
-package when handeling such 'problems'.
x <- "cons/mod2/det[4]/rost2/rost_act[2]/Q2w5"
If you want only the first string between brackets, use str_extract
:
stringr::str_extract(x, "(?<=\\[).+?(?=\\])")
# [1] "4"
If you want all the strings between brackets, use str_extract_all
:
stringr::str_extract_all(x, "(?<=\\[).+?(?=\\])")
# [[1]]
# [1] "4" "2"
Upvotes: 0
Reputation: 33488
You're almost there:
sub("^[^\\]]*\\[(\\d+)\\].*", "\\1", x, perl=TRUE)
## [1] "4"
The original problem is that .*
matches as much as possible of anything before it matches [
. Your solution was *?
which is lazy version of *
(non-greedy, reluctant) matches as little as it can.
Completely valid, another alternative I used is [^\\]]*
: which translates into match anything that is not ]
.
Upvotes: 1