Masood Sadat
Masood Sadat

Reputation: 1337

Extract inside of first square brackets

I know there are a few similar questions, but they did not help me, perhaps due to my lack of understanding the basics of string manipulation.

I have a piece of string that I want to extract the inside of its first square brackets.

x <- "cons/mod2/det[4]/rost2/rost_act[2]/Q2w5"

I have looked all over the internet to assemble the following code but it gives me inside of 2nd brackets

sub(".*\\[(.*)\\].*", "\\1", x, perl=TRUE)

The code returns 2. I expect to get 4.

Would appreciate if someone points out the missing piece.

---- update ----

Replacing .* to .*? in the first two instances worked, but do not know how. I leave the question open for someone who can provide why this works:

sub(".*?\\[(.*?)\\].*", "\\1", x, perl=TRUE)

Upvotes: 0

Views: 516

Answers (2)

Wimpel
Wimpel

Reputation: 27732

stringr

You can solve this with base R, but I usually prefer the functions from the stringr-package when handeling such 'problems'.

x <- "cons/mod2/det[4]/rost2/rost_act[2]/Q2w5"

If you want only the first string between brackets, use str_extract:

stringr::str_extract(x, "(?<=\\[).+?(?=\\])")
# [1] "4"

If you want all the strings between brackets, use str_extract_all:

stringr::str_extract_all(x, "(?<=\\[).+?(?=\\])")
# [[1]]
# [1] "4" "2" 

Upvotes: 0

s_baldur
s_baldur

Reputation: 33488

You're almost there:

sub("^[^\\]]*\\[(\\d+)\\].*", "\\1", x, perl=TRUE)
## [1] "4"

The original problem is that .* matches as much as possible of anything before it matches [. Your solution was *? which is lazy version of * (non-greedy, reluctant) matches as little as it can.

Completely valid, another alternative I used is [^\\]]*: which translates into match anything that is not ].

Upvotes: 1

Related Questions