Regular expression to capture different lines

Question

I'm trying to find a better way to capture variable values from a file that stores some information but facing the problem with line breaks and spaces. For example, a DataSetList variable is given that stores a value in two different ways:

Input

Templates = <
  item
    Name = 'fruits'
    TemplateList = '7,12'
  end>
Surveys = <
  item
    ID = 542
    Name = 'apple'
  end
  item
    ID = 872
    Name = 'banana'
    DataSetList = '873,887,971,1055'
    PluginInfo = {something}
  end
  item
    ID = 437
    Name = 'cherry'
    DataSetList = 
      '438,452,536,620,704,788,1143,1179,1563,1647,1731,1839,1875,1851,' +
      '1863,2060,2359,2443,2469,2620'
    PluginInfo = {something}
  end>

The only way i've found to capture the values of the variables ID, Name, DataSetList variable values that are stored in 'item end' block is (My approach):

Expression

ID[\s\=]*(?P\d*)\s*Name[\s\=]*'(?P.*)'\s*DataSetList[\s\=]*(?P(?:'[\d\,]*'[\s\+]*)*)

ID[\s\=]*(?P\d*)                                    # capture ID
\s*                                                      # match spaces 
Name[\s\=]*'(?P.*)'                                # capture Name
\s*                                                      # match spaces
DataSetList[\s\=]*(?P(?:'[\d\,]*'[\s\+]*)*) # capture DataSetList

My approach output

{'UID': '872',
 'Name': 'banana',
 'DataSetList': "'873,887,971,1055'
    "}

{'UID': '437',
 'Name': 'cherry',
 'DataSetList': "'438,452,536,620,704,788,1143,1179,1563,1647,1731,1839,1875,1851,' +
      '1863,2060,2359,2443,2469,2620'
    "}

Problem

I don't think my approach is good because named capturing group DataSetList also captures spaces, line breaks, literal + and finally requires postprocessing of values.

Any approaches or ideas to improve my regular expression would be very helpful. Unfortunately my knowledge base of regex isn't as deep as i would like it to be. It's very interesting to see how it's done in other ways

Regular expression to capture different lines

Input

Expression

My approach output

Problem

Answers (1)

Related Questions