befunkt
befunkt

Reputation: 131

RegEx - is recursive substitution possible using only a RegEx engine? Conditional search replace

I'm editing some data, and my end goal is to conditionally substitute , (comma) chars with .(dot). I have a crude solution working now, so this question is strictly for suggestions on better methods in practice, and determining what is possible with a regex engine outside of an enhanced programming environment.

I gave it a good college try, but 6 hours is enough mental grind for a Saturday, and I'm throwing in the towel. :)

I've been through about 40 SO posts on regex recursion, substitution, etc, the wiki.org on the definitions and history of regex and regular language, and a few other tutorial sites. The majority is centered around Python and PHP.

The working, crude regex (facilitating loops / search and replace by hand):

(^.*)(?<=\()(.*?)(,)(.*)(?=\))(.*$)

A snip of the input:

room_ass=01:macro_id=01: name=Left, pgm_audio=0, usb=0, list=(1*,3,5,7,),
room_ass=01:macro_id=02: name=Right, pgm_audio=1, usb=1, list=(2*,4,6,8,),
room_ass=01:macro_id=03: name=All, pgm_audio=1, list=(1,2*,3,4,5,6,7,8,),

And the desired output:

room_ass=01: macro_id=01: name=Left, pgm_audio=0, usb=0, list=(1*.3.5.7.),
room_ass=01: macro_id=02: name=Right, pgm_audio=1, usb=1, list=(2*.4.6.8.),
room_ass=01: macro_id=03: name=All, pgm_audio=1, list=(1.2*.3.4.5.6.7.8.),

That's all. Just replace the , with ., but only inside ( ).

This is one conceptual (not working) method I'd like to see, where the middle group<3> would loop recursively:

(^.*)(?<=\()([^,]*)([,|\d|\*]\3.*)(?=\))(.*$)
                   (          ^  )      

..where each recursive iteration would shift across the data, either 1 char or 1 comma at a time:

room_ass=01:macro_id=01: name=Left, pgm_audio=0, usb=0, list=(1*,3,5,7,),
                                                      iter 1-|  ^      |
                                                           2-|    ^    |
                                                           3-|      ^  |
                                                           4-|        ^|

or A much simpler approach would be to just tell it to mask/select all , between the (), but I struck out on figuring that one out. I use text editors a lot for little data editing tasks like this, so I'd like to verify that SublimeText can't do it before I dig into Python.

All suggestions and criticisms welcome. Be gentle. <--#n00b Thanks in advance! -B

Upvotes: 3

Views: 358

Answers (2)

bobble bubble
bobble bubble

Reputation: 18490

Not much magic needed. Just check, if there's a closing ) ahead, without any ( in between.

,(?=[^)(]*\))

See this demo at regex101

However it does not check for an opening (. It's a common approach and probably a dulicate.

Upvotes: 1

Dean Taylor
Dean Taylor

Reputation: 41981

This is a complete guess because I don't use SublimeText, the assumption here is that SublimeText uses PCRE regular expressions.

Note that you mention "recursive", I don't believe you mean Regular Expression Recursion that doesn't fit the problem here.

Something like this might work... You'll need to test to make sure this isn't matching other things in your document and to see if SublimeText even supports this...

This is based on using the /K operator to "keep" what comes before it - you can find other uses of it as an PCRE alternative (workaround) to variable look-behinds not being supported by PCRE.

Regular Expression

\((?:(?:[^,\)]+),)*?(?:[^,\)]+)\K,

Visualisation

Regex Visualisation

Regex Description

  • Match the opening parenthesis character \(
  • Match the regular expression below (?:(?:[^,\)]+),)*?
    • Between zero and unlimited times, as few times as possible, expanding as needed (lazy) *?
    • Match the regular expression below (?:[^,\)]+)
      • Match any single character NOT present in the list below [^,\)]+
        • Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
        • The literal character “,” ,
        • The closing parenthesis character \)
    • Match the character “,” literally ,
  • Match the regular expression below (?:[^,\)]+)
    • Match any single character NOT present in the list below [^,\)]+
      • Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
      • The literal character “,” ,
      • The closing parenthesis character \)
  • Keep the text matched so far out of the overall regex match \K
  • Match the character “,” literally ,

Upvotes: 1

Related Questions