Nate Glenn
Nate Glenn

Reputation: 6744

Changing a regex to avoid quoted tokens

I have a handy-dandy regex provided from a previous SO question:

$regex = qr/
    (sp\s+              #start with 'sp'        
    \{                  #opening brace
      (                 #save to $2
         (?:            #either
            \{ (?-1) \} #more braces and recurse
            |           #or
            [^{}]++     #non-brace characters
         )*             #0 or more times
      )                 #end $2
    \}                  #ending brace
    )                   #end $1
    /x;

I use it to extract textual structures from a file which are of the form sp {}, with possible further nesting of the curly braces. It correctly saves the following text in $1:

sp {foo {bar} baz}

But I've run into a problem: quoting. In the text I have, vertical bars can be used to quote:

sp {foo |}}}}bar}}}{{|}

That entire thing is one structure, but the current regex I have will only match sp {foo |}. The matter is further complicated because a vertical bar can be escaped within a quote using a backslash:

sp {foo |}\|bar|}

should also match. Does anyone have any ideas on how to soup-up this regex to handle quotes and quote escaping?

Upvotes: 1

Views: 89

Answers (2)

ikegami
ikegami

Reputation: 385655

Replace

[^{}]

with

(?: [^|{}]++
|   \| (?: [^\\|]++ | \\. )*+ \|
)

Upvotes: 1

Andy Lester
Andy Lester

Reputation: 93636

Look at a CPAN module like Text::Balanced.

Upvotes: 1

Related Questions