m0nhawk
m0nhawk

Reputation: 24168

How to replace quote with multiple quotes between parentheses only?

I got a malformed CSV which has quotes inside parentheses, like this:

1, 2, 3, "4, 5, 6, (7, 8, 9, "10, 11, 12", 13), 14"

The desired output is:

1, 2, 3, "4, 5, 6, (7, 8, 9, ""10, 11, 12"", 13), 14"

I can think of replacing first one quote with regex, but how to do this for all quotes inside parentheses?

I can only think of:

s = '''1, 2, 3, "4, 5, 6, (7, 8, 9, "10, 11, 12", 13), 14"'''
s.replace(re.search(r'\(.*\)', s).group(0), re.search(r'\(.*\)', s).group(0).replace('"', '""'))

But I need this efficient enough as the CSV is huge (> 100'000) with only a few malformed lines.

Upvotes: 1

Views: 40

Answers (1)

DirtyBit
DirtyBit

Reputation: 16772

import re

data = '1, 2, 3, "4, 5, 6, (7, 8, 9, "10, 11, 12", 13), 14"'

def replace(g):
    return g.group(0).replace('"', '""')

print(re.sub(r'\(.*?\)', replace, data))

OUTPUT:

1, 2, 3, "4, 5, 6, (7, 8, 9, ""10, 11, 12"", 13), 14"

Upvotes: 1

Related Questions