Reputation: 1
i have a .csv file with all fields separated by double quotes, but some fields have random double quotes in them/ UPDATE this was a bit off and i'm including two lines, the second of which is a problem. in the original, i didn't have double quotes at the end, which is a problem with the first solution, which works otherwise but strips the quote before /n:
"20135025373","25","2013-08-24 00:00:00","WOOD","CHRISTY","","","2679 W. LONG CIRCLE","","LITTLETON","CO","80120","","3510862","2013-09-03 00:00:00","Monetary (Itemized)","Credit/Debit Card","Individual","","Issue Committee","A WHOLE LOT OF PEOPLE FOR JOHN MORSE","","","","N","N","0","STATEWIDE",""
"20135025373","10","2013-08-24 00:00:00","DAVIS","JOHN","","","2822 THIRD "","","BOULDER","CO","80304","","3510863","2013-09-03 00:00:00","Monetary (Itemized)","Credit/Debit Card","Individual","","Issue Committee","A WHOLE LOT OF PEOPLE FOR JOHN MORSE","","","","N","N","0","STATEWIDE",""
i tried this code, but it also strips the quotes at the beginning and end of the lines.
import re
with open('ColoSOS/2014_ContData.csv') as old, open('2014contx.csv', 'w') as new:
new.writelines(re.sub(r'(?<!,)"(?!,)', '', line) for line in old)
any ideas are appreciated!
Upvotes: 0
Views: 329
Reputation: 70732
If you can use the csv
module, begin by taking a look at Removing in-field quotes in csv file.
If you're looking to do this by using regular expression, I suppose this will suffice.
re.sub(r'(?<=[^,])"(?=[^,])', '', line)
See working Demo
Upvotes: 1
Reputation: 15102
Can you use the csv
module instead of re
? It probably already has this intelligence built-in.
I'm rusty on csv
. The below code is not tested, but may give you a starting place.
import csv
with open('ColoSOS/2014_ContData.csv') as old, open('2014contx.csv', 'w') as new:
reader = csv.reader(old, delimiter = ','; quotechar = '"')
new.writelines(row) for row in reader
Reference: https://docs.python.org/2/library/csv.html
Upvotes: 0
Reputation: 11323
If you don't want to match quotes at beginning and end of the line you could use this regex:
(?<!,|^)\"(?!,|$)
Instead of:
(?<!,)"(?!,)
see demo here: http://regex101.com/r/cI7mW5
Upvotes: 0