Reputation: 872
When working with .csv files, there is typically quotes surrounding cells that contain a ',' and sometimes even all the cells have quotes. I'm trying to isolate the cells that have quotes. Take this code for example:
import re
example_row = 'Value1,"If you study, you will get an "A".","If you do not study, you will fail.",Value4'
quote_pattern = re.compile(r'^".*",|,".*",|,".*"$', re.DOTALL)
print(quote_pattern.findall(example_row))
The output for this is:
[',"If you study, you will make an "A"","If you do not study, you will get an "F"",']
My desired output is this:
[',"If you study, you will make an "A"",', ',"If you do not study, you will get an "F"",']
How do I change the regular expression to recognize this? The intent here is to not split up .csv files using regex; rather, it is to address the issue of regular expressions when you have a case within a case.
Upvotes: 0
Views: 81
Reputation: 785256
For your simple case you may use this regex in python:
>>> import re
>>> row = 'Value1,"If you study, you will get an "A".","If you do not study, you will get an "F"",Value4'
>>> print( re.findall(r'(?:^|,)"(.*?)"(?=,|$)', row) )
['If you study, you will get an "A".', 'If you do not study, you will get an "F"']
RegEx Details:
(?:^|,)
: Match start or a ,
"
: Match opening "
(.*?)
: Match and group 0 or more characters (lazy quantifier)"
: Match closing "
(?=,|$)
: Lookahead to assert that we have a ,
or line end aheadBut as I commented above that prefer using a CSV parser module.
Upvotes: 1