labjunky
labjunky

Reputation: 831

Python regex: find and replace commas between quotation marks

I have a string,

line = '12/08/2013,3,"9,25",42:51,"3,08","12,9","13,9",159,170,"3,19",437,'

and I would like to find and replace the commas, between quotation marks, with ":". Looking for a results

line = '12/08/2013,3,9:25,42:51,3:08,12:9,13:9,159,170,3:19,437,'

So far I have been able to match this pattern with,

import re
re.findall('(\"\d),(.+?\")', line)

however, I guess I should be using

re.compile(...something..., line)
re.sub(':', line)

Does anyone know how to do this? thanks, labjunky

Upvotes: 4

Views: 4799

Answers (3)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627082

There is also a generic regex solution to replace any kind of fixed (and non-fixed, too) pattern in between double (or single) quotes: match the double- or single quoted substrings with the corresponding pattern, and use a callable as the replacement argumet to re.sub where you may manipulate the match:

  1. Replacing commas in between double quotes with colons and remove the double quotes (the current OP scenario):

    re.sub(r'"([^"]*)"', lambda x: x.group(1).replace(',', ':'), line) (demo)
    # => 12/08/2013,3,9:25,42:51,3:08,12:9,13:9,159,170,3:19,437,

  2. Replacing commas in between double quotes with colons and keep the double quotes:

    re.sub(r'"[^"]*"', lambda x: x.group(0).replace(',', ':'), line) (demo)
    # => 12/08/2013,3,"9:25",42:51,"3:08","12:9","13:9",159,170,"3:19",437,

  3. Replacing commas in between double and single quotes with colons and keep the single/double quotes:

    re.sub(r''''[^']*'|"[^"]*"''', lambda x: x.group(0).replace(',', ':'), '''0,1,"2,3",'4,5',''') (demo)
    # => 0,1,"2:3",'4:5',

Also, if you need to handle escaped single and double quotes, consider using r"'[^\\']*(?:\\.[^\\']*)*'" (for single quoted substrings), r'"[^\\"]*(?:\\.[^\\"]*)*"' (for double quoted substrings) or for both - r''''[^\\']*(?:\\.[^\\']*)*'|"[^\\"]*(?:\\.[^\\"]*)*"''' instead of the patterns above.

Upvotes: 0

perreal
perreal

Reputation: 98068

import re
line = '12/08/2013,3,"9,25",42:51,"3,08","12,9","13,9",159,170,"3,19",437,'
r = ""
for t in re.split(r'("[^"]*")', line):
    if t[0] == '"': 
        t = t.replace(",", ":")[1:-1]
    r += t
print r

Prints:

12/08/2013,3,9:25,42:51,3:08,12:9,13:9,159,170,3:19,437,

Upvotes: 0

falsetru
falsetru

Reputation: 369274

>>> import re
>>> line = '12/08/2013,3,"9,25",42:51,"3,08","12,9","13,9",159,170,"3,19",437,'
>>> re.sub(r'"(\d+),(\d+)"', r'\1:\2', line)
'12/08/2013,3,9:25,42:51,3:08,12:9,13:9,159,170,3:19,437,'

\1, \2 refer to matched groups.


Non-regex solution:

>>> ''.join(x if i % 2 == 0 else x.replace(',', ':')
            for i, x in enumerate(line.split('"')))
'12/08/2013,3,9:25,42:51,3:08,12:9,13:9,159,170,3:19,437,'

Upvotes: 8

Related Questions