Is it possible to write a regular expression that matches a particular pattern and then does a replace with a part of the pattern

Question

I'm working with some comma delimited text files. The file is comprised of approximately 400 rows and 94 columns all comma delimited and withing double quotes:

"H","9","YES","NO"....

My aim is to split the file up to its respective columns using the comma delimiter. Unfortunately, there are several fields within the rows that have the following format:

"4,5"  or "2,5,8"

These fields are corrupting the column structure of the file when parse the file on the comma. So what I'd like to do is use regular expression to to do some sort of find and replace so that I can successfully parse my file. For example:

 "H","9","YES","NO","4,5","Y","N"  would become this:


"H","9","YES","NO","4|5","Y","N"

so that when I parse the file I would get seven columns instead of eight.

I wrote a regular expression that handles matching "2,5" or "2,3,4", but I'm not sure how to handle the replacing part.

Is it possible to accomplish this regular expressions?

Note: I'm using perl regular expressions.

Jeff Y · Accepted Answer

Rather than interfere with what is evidently source data, i.e. the stuff inside the quotes, you might consider replacing the field-separator commas instead:

s/,([^,"]*|"[^"]*")(?=(,|$))/|$1/g

Note that this also handles non-quoted fields.

On this data: "H",9,"YES","NO","4,5","Y","N"

$ perl -pe 's/,([^,"]*|"[^"]*")(?=(,|$))/|$1/g' commasep
"H"|9|"YES"|"NO"|"4,5"|"Y"|"N"

Which can afterwards be split on "|":

$ perl -ne 's/,([^,"]*|"[^"]*")(?=(,|$))/|$1/g;print join "---",split "\|"' commasep
"H"---9---"YES"---"NO"---"4,5"---"Y"---"N"

Is it possible to write a regular expression that matches a particular pattern and then does a replace with a part of the pattern

Answers (2)

Related Questions