Reputation: 24168
I got a malformed CSV which has quotes inside parentheses, like this:
1, 2, 3, "4, 5, 6, (7, 8, 9, "10, 11, 12", 13), 14"
The desired output is:
1, 2, 3, "4, 5, 6, (7, 8, 9, ""10, 11, 12"", 13), 14"
I can think of replacing first one quote with regex, but how to do this for all quotes inside parentheses?
I can only think of:
s = '''1, 2, 3, "4, 5, 6, (7, 8, 9, "10, 11, 12", 13), 14"'''
s.replace(re.search(r'\(.*\)', s).group(0), re.search(r'\(.*\)', s).group(0).replace('"', '""'))
But I need this efficient enough as the CSV is huge (> 100'000) with only a few malformed lines.
Upvotes: 1
Views: 40
Reputation: 16772
import re
data = '1, 2, 3, "4, 5, 6, (7, 8, 9, "10, 11, 12", 13), 14"'
def replace(g):
return g.group(0).replace('"', '""')
print(re.sub(r'\(.*?\)', replace, data))
OUTPUT:
1, 2, 3, "4, 5, 6, (7, 8, 9, ""10, 11, 12"", 13), 14"
Upvotes: 1