Vincent Hahn
Vincent Hahn

Reputation: 63

Replace commas enclosed in curly braces

I try to replace commas with semicolons enclosed in curly braces.

Sample string:

text = "a,b,{'c','d','e','f'},g,h"

I am aware that it comes down to lookbehinds and lookaheads, but somehow it won't work like I want it to:

substr = re.sub(r"(?<=\{)(.+?)(,)(?=.+\})",r"\1;", text)

It returns:

a,b,{'c';'d','e','f'},g,h

However, I am aiming for the following:

a,b,{'c';'d';'e';'f'},g,h

Any idea how I can achieve this? Any help much appreciated :)

Upvotes: 5

Views: 2657

Answers (3)

Alex
Alex

Reputation: 21766

Below I have posted a solution that does not rely on an regular expression. It uses a stack (list) to determine if a character is inside a curly bracket {. Regular expression are more elegant, however, they can be harder to modify when requirements change. Please note that the example below also works for nested brackets.

text = "a,b,{'c','d','e','f'},g,h"
output=''
stack = []
for char in text:
    if char == '{':
        stack.append(char)
    elif char == '}':
        stack.pop()    
    #Check if we are inside a curly bracket
    if len(stack)>0 and char==',':
        output += ';'
    else:
        output += char
print output

This gives:

'a,b,{'c';'d';'e';'f'},g,h

You can also rewrite this as a map function if you use a the global variable for stack:

stack = []


def replace_comma_in_curly_brackets(char):
    if char == '{':
       stack.append(char)
    elif char == '}':
        stack.pop()    
    #Check if we are inside a curly bracket
    if len(stack)>0 and char==',':
        return ';'

    return char

text = "a,b,{'c','d','e','f'},g,h"
print ''.join(map(str, map(replace_comma_in_curly_brackets,text)))

Regarding performance, when running the above two methods and the regular expression solution proposed by @stribizhev on the test string at the end of this post, I get the following timings:

  1. Regular expression (@stribizshev): 0.38 seconds
  2. Map function: 26.3 seconds
  3. For loop: 251 seconds

This is the test string that is 55,300,00 characters long:

 text = "a,able,about,across,after,all,almost,{also,am,among,an,and,any,are,as,at,be,because},been,but,by,can,cannot,could,dear,did,do,does,either,else,ever,every,for,from,get,got,had,has,have,he,her,hers,him,his,how,however,i,if,in,into,is,it,its,just,least,let,like,likely,may,me,might,most,must,my,neither,no,nor,not,of,off,often,on,only,or,other,our,own,rather,said,say,says,she,should,since,so,some,than,that,the,their,them,then,there,these,they,this,tis,to,too,twas,us,wants,was,we,were,what,when,where,which,while,who,whom,why,will,with,would,yet,you,your" * 100000

Upvotes: 2

bobble bubble
bobble bubble

Reputation: 18490

If you don't have nested braces, it might be enough to just look ahead at each ,
if there is a closing } ahead without any opening { in between. Search for

,(?=[^{]*})

and replace with ;

  • , match a comma literally
  • (?=...) the lookahead to check
  • if there's ahead [^{]* any amount of characters, that are not {
  • followed by a closing curly brace }

See demo at regex101

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626903

You can match the whole block {...} (with {[^{}]+}) and replace commas inside it only with a lambda:

import re
text = "a,b,{'c','d','e','f'},g,h"
print(re.sub(r"{[^{}]+}", lambda x: x.group(0).replace(",", ";"), text))

See IDEONE demo

Output: a,b,{'c';'d';'e';'f'},g,h

By declaring lambda x we can get access to each match object, and get the whole match value using x.group(0). Then, all we need is replace a comma with a semi-colon.

This regex does not support recursive patterns. To use a recursive pattern, you need PyPi regex module. Something like m = regex.sub(r"\{(?:[^{}]|(?R))*}", lambda x: x.group(0).replace(",", ";"), text) should work.

Upvotes: 3

Related Questions