Gabe Morris
Gabe Morris

Reputation: 872

How do you get a regular expression to match a case that is within a case?

When working with .csv files, there is typically quotes surrounding cells that contain a ',' and sometimes even all the cells have quotes. I'm trying to isolate the cells that have quotes. Take this code for example:

import re

example_row = 'Value1,"If you study, you will get an "A".","If you do not study, you will fail.",Value4'

quote_pattern = re.compile(r'^".*",|,".*",|,".*"$', re.DOTALL)

print(quote_pattern.findall(example_row))

The output for this is:

[',"If you study, you will make an "A"","If you do not study, you will get an "F"",']

My desired output is this:

[',"If you study, you will make an "A"",', ',"If you do not study, you will get an "F"",']

How do I change the regular expression to recognize this? The intent here is to not split up .csv files using regex; rather, it is to address the issue of regular expressions when you have a case within a case.

Upvotes: 0

Views: 81

Answers (1)

anubhava
anubhava

Reputation: 785256

For your simple case you may use this regex in python:

>>> import re
>>> row = 'Value1,"If you study, you will get an "A".","If you do not study, you will get an "F"",Value4'
>>> print( re.findall(r'(?:^|,)"(.*?)"(?=,|$)', row) )
['If you study, you will get an "A".', 'If you do not study, you will get an "F"']

RegEx Details:

  • (?:^|,): Match start or a ,
  • ": Match opening "
  • (.*?): Match and group 0 or more characters (lazy quantifier)
  • ": Match closing "
  • (?=,|$): Lookahead to assert that we have a , or line end ahead

But as I commented above that prefer using a CSV parser module.

Upvotes: 1

Related Questions