jonbon
jonbon

Reputation: 1200

RegEx to Match Characters Directly Before Keyword and Directly Afterwards

I'm not good enough with RegEx yet. I've been searching around and trying to write my own, and haven't succeeded. I want to search through a string

Shelf-15-Contains(Item)10-Depo91

I want to search for (), which can be done by

/\(([^()]+)\)/g

When the RegEx finds () I want to grab the 'stuff' that comes right before the (), the () and everything inside, and then whatever follows directly afterwards. So,

Contains(Item)10

EDIT: Also, the RegEx I have above makes sure that there aren't nested (), so once I figure out how to match what comes before and after I should be able to run this recursively if there were multiple nested layers?

Upvotes: 0

Views: 62

Answers (3)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626728

IMHO, no need to overcomplicate here. Here is a regex that will match Contains, everything in the brackets (with or without nested ones, balanced or not), and the optional digits. It assumes that there are -s around this construction:

\w+\(.*?\)\d*(?=-|$)

See demo

Input:

Shelf-15-Contains(I(t)e(m))10-Depo91
Shelf-15-Contains(I(t)e(m))-Depo91

Matches:

Contains(I(t)e(m))10
Contains(I(t)e(m))

Upvotes: 0

Loic
Loic

Reputation: 650

For matching before and after, use additional capturing groups:

while (
  $str
  =~ m/
        ([^-]*)          # before
        \( ( [^()]* ) \) # (in)
        (?= ([^-]*) )    # after
     /gx
) {
    my ($before, $in, $after) = ($1, $2, $3);
    ...
}

Nested constructs cannot be recognized by regular expressions in the strict sense (finite state machine accepting a string). Perl's regex engine offer additional constructions for recognizing balanced parentheses, but they are difficult rather to use.

http://perldoc.perl.org/perlre.html#Extended-Patterns gives examples how to parse balanced parentheses, at (??{ code }) and (?PARNO).

Finally, the structure of the string you want to parse seems to be a --separated list. Try to find a formal grammar for what you want to parse, it will help you to design your program.

If you don't need to handle a(b)c(d)e, then you can simplify (?= ([^-]*) ) to ([^-]*).

Upvotes: 1

Toto
Toto

Reputation: 91385

How about:

/([^-]+\([^()]+\)[^-]+)/g

Upvotes: 1

Related Questions