Kannan
Kannan

Reputation: 3

Regex help needed to parse and extract property from an expression tree

Here is a valid property tree expression (it can be recursive):

rootProperty:(prop1, prop2, subProp1:(prop1,subSubProp1:(prop1,prop2,etc),prop3), prop3, etc)

So in effect a property can have many properties and sub-properties. From this expression I would like to capture the following:

I tried few approaches but could't get the repetitions working recursively. Hence seeking help.

Thanks Kannan

Upvotes: 0

Views: 151

Answers (1)

mkayaalp
mkayaalp

Reputation: 2716

This is not a regular language due to recursion (balanced parens), so a regular expression might not be what you need. But assuming you know what you are doing:

([^:(), ]+)(?::\(((?R)?(?:, ?(?R))*)\))?

First we capture the name of the property: one or more characters that are not :(), .

([^:(), ]+)

A property may or may not have a subtree, so the next part is the optional subtree:

(?:           <--- do not capture
   :          <--- literal ':'
   \(         <--- literal '('
      ...     <--- some stuff inside
   \)         <--- literal ')'
)?            <--- it is optional

The stuff inside captures a list of properties:

(             <--- do capture
 (?R)         <--- recursively match a property
 (?:          <--- do not capture
    , ?       <--- comma followed by optional space
    (?R)      <--- recursively match another property
 )*           <--- any number of comma separated properties
)             <--- end capture

For your example input:

Input:
    rootProperty:(prop1, prop2, subProp1:(prop1,subSubProp1:(prop1,prop2,etc),prop3), prop3, etc)
Match 1:
    rootProperty:(prop1, prop2, subProp1:(prop1,subSubProp1:(prop1,prop2,etc),prop3), prop3, etc)
    Group 1:
        rootProperty
    Group 2:
        prop1, prop2, subProp1:(prop1,subSubProp1:(prop1,prop2,etc),prop3), prop3, etc

You could then recursively match the second group of each match for capturing the properties of a subtree. There should be a way to get the backtracking information so you don't need to do this, but I don't know how.

Input:
    prop1, prop2, subProp1:(prop1,subSubProp1:(prop1,prop2,etc),prop3), prop3, etc
Match 1:
    prop1
Match 2:
    prop2
Match 3:
    subProp1:(prop1,subSubProp1:(prop1,prop2,etc),prop3)
    Group 1:
        subProp1
    Group 2:
        prop1,subSubProp1:(prop1,prop2,etc),prop3
Match 4:
    prop3
Match 5:
    etc

Then,

Input:
    prop1,subSubProp1:(prop1,prop2,etc),prop3
Match 1:
    prop1
Match 2:
    subSubProp1:(prop1,prop2,etc)
    Group 1:
        subSubProp1
    Group 2:
        prop1,prop2,etc
Match 3:
    prop3

And finally:

Input:
    prop1,prop2,etc
Match 1:
    prop1
Match 2:
    prop2
Match 3:
    etc

https://regex101.com/r/WAXrFd/2

Upvotes: 1

Related Questions