Reputation: 546
I am trying to go over a very large CSV file to find all unique strings per column. For example:
John
John
John
Mark
should return John
and Mark
.
I can't figure out what the problem is with my code. The error message is not helpful either (specifically 3rd and 4th error):
"my" variable @found masks earlier declaration in same scope at getdata.pl line 66.
"my" variable $answer masks earlier declaration in same statement at getdata.pl line 67.
syntax error at getdata.pl line 55, near "){"
Global symbol "@master_fields" requires explicit package name (did you forget to declare "my @master_fields"?) at getdata.pl line 58.
syntax error at getdata.pl line 61, near "} else"
Could someone point me in the right direction?
Here is the code I have:
# open file
open my $lines, '<', 'data.csv' or die "Unable to open data.csv\n";
my @records = <$lines>;
close $lines or die "Unable to close data.csv\n"; # Close the input file
# iterate through each line
foreach my $line ( @records ) {
if ( $csv->parse($line) ) {
my @master_fields = $csv->fields();
# if the string is already in the @found array, go to next line.
if ( grep( /^$master_fields[0]$/, @found ) {
next;
}
else {
# else; add to the @found array
push @found, $master_fields[0];
}
}
else {
warn "Line/record could not be parsed: @yob_records\n";
}
}
Upvotes: 1
Views: 135
Reputation: 385655
if ( grep( /^$master_fields[0]$/, @found ){
should be
if ( grep( /^$master_fields[0]$/, @found ) ){
Since $master_fields[0]
doesn't contain a regex pattern, you need to convert it into a regex pattern.
grep( /^$master_fields[0]$/, @found )
should be
grep( /^\Q$master_fields[0]\E$/, @found )
Since you want to a perfect match against $master_fields[0]
,
grep( /^\Q$master_fields[0]\E$/, @found )
should be
grep( /^\Q$master_fields[0]\E\z/, @found )
or better yet,
grep( $_ eq $master_fields[0], @found )
Finally, you're misusing the CSV parser —let it determine where a records ends by using getline
instead of splitting on newlines— and you're being extremely inefficient —O(N2) instead of O(N)— by using an array instead of a hash.
my $csv = Text::CSV_XS->new({ binary => 1, auto_diag => 2 }); # Or Text::CSV
my $qfn = 'data.csv';
open(my $fh, '<', $qfn)
or die("Unable to open \"$qfn\": $!\n");
my %found;
while ( my $row = $csv->getline($fh) ) {
++$found{ $row->[0] };
}
my @found = sort keys %found;
Upvotes: 4