fish
fish

Reputation: 1

Python regex to get some/not all quote marks out of csv file

i have a .csv file with all fields separated by double quotes, but some fields have random double quotes in them/ UPDATE this was a bit off and i'm including two lines, the second of which is a problem. in the original, i didn't have double quotes at the end, which is a problem with the first solution, which works otherwise but strips the quote before /n:

"20135025373","25","2013-08-24 00:00:00","WOOD","CHRISTY","","","2679 W. LONG CIRCLE","","LITTLETON","CO","80120","","3510862","2013-09-03 00:00:00","Monetary (Itemized)","Credit/Debit Card","Individual","","Issue Committee","A WHOLE LOT OF PEOPLE FOR JOHN MORSE","","","","N","N","0","STATEWIDE",""

"20135025373","10","2013-08-24 00:00:00","DAVIS","JOHN","","","2822 THIRD "","","BOULDER","CO","80304","","3510863","2013-09-03 00:00:00","Monetary (Itemized)","Credit/Debit Card","Individual","","Issue Committee","A WHOLE LOT OF PEOPLE FOR JOHN MORSE","","","","N","N","0","STATEWIDE",""

i tried this code, but it also strips the quotes at the beginning and end of the lines.

import re

with open('ColoSOS/2014_ContData.csv') as old, open('2014contx.csv', 'w') as new:
    new.writelines(re.sub(r'(?<!,)"(?!,)', '', line) for line in old)

any ideas are appreciated!

Upvotes: 0

Views: 329

Answers (3)

hwnd
hwnd

Reputation: 70732

If you can use the csv module, begin by taking a look at Removing in-field quotes in csv file.

If you're looking to do this by using regular expression, I suppose this will suffice.

re.sub(r'(?<=[^,])"(?=[^,])', '', line)

See working Demo

Upvotes: 1

CivFan
CivFan

Reputation: 15102

Can you use the csv module instead of re? It probably already has this intelligence built-in.

I'm rusty on csv. The below code is not tested, but may give you a starting place.

import csv

with open('ColoSOS/2014_ContData.csv') as old, open('2014contx.csv', 'w') as new:
    reader = csv.reader(old, delimiter = ','; quotechar = '"')
    new.writelines(row) for row in reader    

Reference: https://docs.python.org/2/library/csv.html

Upvotes: 0

donfuxx
donfuxx

Reputation: 11323

If you don't want to match quotes at beginning and end of the line you could use this regex:

(?<!,|^)\"(?!,|$)

Instead of:

(?<!,)"(?!,)

see demo here: http://regex101.com/r/cI7mW5

Upvotes: 0

Related Questions