Reputation: 437
I have a csv (separated by ,) with multilines. The csv has 4 columns, of which first 3 columns contain multiline text while the group by happens on last column.
Input csv content: /tmp/test.tmp.csv
"Total Sections",ota,4!n,01
"Input History",80,"HHMM28!c1!a[4!a]
6X
9X]",1
"T (MR)",17t,(MTR),02
"Input History",80,"HHMM28!c1!a[4!a]
6X
9X]",2
Reference,:4!t/1c,:(Text1)/(Text2),30
Reference,:4!t/1c,:(Text1)/(Text2),32
Above csv consists of 6 records, record 2 and 4 are with multilines.
Expected output (group by separate is space):
"Total Sections",ota,4!n,01
"Input History",80,"HHMM28!c1!a[4!a]
6X
9X]",1 2
"T (MR)",17t,(MTR),02
Reference,:4!t/1c,:(Text1)/(Text2),30 32
My perl script (read first 3 fields in hash as key, the last field in hash as value, print to csv with join):
#!/usr/bin/perl -w
use strict;
use warnings;
use Text::CSV;
my %hash;
my @array;
my $in_qfn = "/tmp/test.tmp.csv";
my $out_qfn = "/tmp/test.out.tmp.csv";
# Li: Parsing multilines in csv
my $parser = Text::CSV->new({
binary => 1,
auto_diag => 1,
sep_char => ','
});
# Li: output multilines to csv
my $csvo = Text::CSV->new({
binary => 1,
eol => "\r\n",
sep_char => ','
});
open(my $data, '<:encoding(utf8)', $in_qfn) or die "Could not open $in_qfn: $!\n";
open(my $sts, '>:encoding(utf8)', $out_qfn) or die "Could not write $out_qfn: $!\n";
while (my $fields = $parser -> getline($data)) {
my $fz = $fields->[0];
my $fo = $fields->[1];
my $ft = $fields->[2];
my $fth = $fields->[3];
my @flds = ($fz, $fo, $ft, $fth);
# Li: push the first 3 columns as key and the last column as value
push(@{$hash{@flds[0..2]} }, $flds[3]);
}
# Li: print to output csv without join yet
for my $k (sort keys %hash) {
my @fldsAll = ($k, @{ $hash{$k}});
print("###LI### 1: key: $k, value: @fldsAll\n");
$csvo -> print($sts, \@fldsAll);
}
However, the script doesn't work perfectly, the hash key got lost due to the multiline and possibly special characters and without double quote everywhere.
Defected output:
(MTR),02
4!n,01
:(Text1)/(Text2),30,32
"HHMM28!c1!a[4!a]
6X
9X]",1,2
Any idea on how to fix it? Or a brand new perl solution is also appreciate.
Upvotes: 0
Views: 202
Reputation: 8142
You can't use an array as a hash key like you've done because rather than use all the values, it only uses the last one.
And you can't use a reference to the array as the keys aren't related to the values in the array. Take this example code...
for($i=0;$i<3;$i++)
{
my @a=(1,2,3);
$hash{\@a}=10;
}
Because the scope of @a
is local to the loop, you end up with 3 keys. If you put the my @a;
outside of the loop, you end up with 1 key. You can change the contents of the array and it would have no effect on the key.
Instead what you'll have to do is join
the array into one string.
push(@{$hash{join("\t",@flds[0..2])} }, $flds[3]);
I've used a tab, but any string of characters that won't ever appear in any of the 3 columns is what you want, so that if required you can split
it later to get the original values back.
Upvotes: 1