Reputation: 51
I am using perl to parse CSV file. I use regex to parse it instead of any library. I know, there is CSV parser library available, and not only one, but I just decided I want to use REGEX.
I created I think a quite nice and working regular expression for this. I originally created other applications, which take a regular expression only to parse files. And I wanted to reuse it for this case.
I want to use the beauty of perl put it in one line:
my $text = '"",hi there,"","2018-04-23,\" 13:14:53",,hostname,mac,"ipaddress",199';
my @data = $text =~ m/(?:^|,)(?:"(|.*?[^\\])"|([^,]*))(?:|$)/g;
However, when I do that in one-liner perl regex captures even non-capturing groups.
Here is a testing code:
my $text = '"",hi there,"","2018-04-23,\" 13:14:53",,hostname,mac,"ipaddress",199';
my @data = $text =~ m/(?:^|,)(?:"(|.*?[^\\])"|([^,]*))(?:|$)/g;
foreach (@data) { print "a --${_}--\n"; }
while ($text =~ m/(?:^|,)(?:"(|.*?[^\\])"|([^,]*))(?:|$)/cg) {
print "b --${1}${2}--\n";
}
Results for "a" dump are:
a ----
a ----
a ----
a --hi there--
a ----
a ----
a --2018-04-23,\" 13:14:53--
a ----
a ----
a ----
a ----
a --hostname--
a ----
a --mac--
a --ipaddress--
a ----
a ----
a --199--
You can see there extra empty lines as opposed to the correct results from "b" dump:
b ----
b --hi there--
b ----
b --2018-04-23,\" 13:14:53--
b ----
b --hostname--
b --mac--
b --ipaddress--
b --199--
Has any body met with this issue? Thank you for your answers / ideas / bug findings.
Upvotes: 1
Views: 1574
Reputation: 51
As soon as I posted my question I realised that issue are not the non-capturing groups, but actually capturing groups, which only one of them has a value and the other is empty at the time.
The culprit is this section of the regex:
(?:"(|.*?[^\\])"|([^,]*))
Everything went all right after replacing non-capturing group with branch-reset feature:
(?|"(|.*?[^\\])"|([^,]*))
So final working correct one-liner is:
my @data = $text =~ m/(?:^|,)(?|"(|.*?[^\\])"|([^,]*))(?:|$)/g;
Hopefully someone will find this information useful.
Upvotes: 2