Reputation: 3538
I'm working with some comma delimited text files. The file is comprised of approximately 400 rows and 94 columns all comma delimited and withing double quotes:
"H","9","YES","NO"....
My aim is to split the file up to its respective columns using the comma delimiter. Unfortunately, there are several fields within the rows that have the following format:
"4,5" or "2,5,8"
These fields are corrupting the column structure of the file when parse the file on the comma. So what I'd like to do is use regular expression to to do some sort of find and replace so that I can successfully parse my file. For example:
"H","9","YES","NO","4,5","Y","N" would become this:
"H","9","YES","NO","4|5","Y","N"
so that when I parse the file I would get seven columns instead of eight.
I wrote a regular expression that handles matching "2,5" or "2,3,4", but I'm not sure how to handle the replacing part.
Is it possible to accomplish this regular expressions?
Note: I'm using perl regular expressions.
Upvotes: 0
Views: 138
Reputation: 2456
Rather than interfere with what is evidently source data, i.e. the stuff inside the quotes, you might consider replacing the field-separator commas instead:
s/,([^,"]*|"[^"]*")(?=(,|$))/|$1/g
Note that this also handles non-quoted fields.
On this data: "H",9,"YES","NO","4,5","Y","N"
$ perl -pe 's/,([^,"]*|"[^"]*")(?=(,|$))/|$1/g' commasep
"H"|9|"YES"|"NO"|"4,5"|"Y"|"N"
Which can afterwards be split on "|":
$ perl -ne 's/,([^,"]*|"[^"]*")(?=(,|$))/|$1/g;print join "---",split "\\|"' commasep
"H"---9---"YES"---"NO"---"4,5"---"Y"---"N"
Upvotes: 1
Reputation: 7057
Use a look around
(?<!"),(?!")
replacing it with a pipe.
which means
(?<!") - character before is not a "
, - match a comma
(?!") - character after is not a "
Upvotes: 1