willer2k
willer2k

Reputation: 546

Perl syntax error without explanation while opening file to manage data

I am trying to go over a very large CSV file to find all unique strings per column. For example:

John
John
John
Mark

should return John and Mark.

I can't figure out what the problem is with my code. The error message is not helpful either (specifically 3rd and 4th error):

"my" variable @found masks earlier declaration in same scope at getdata.pl line 66.
"my" variable $answer masks earlier declaration in same statement at getdata.pl line 67.
syntax error at getdata.pl line 55, near "){"
Global symbol "@master_fields" requires explicit package name (did you forget to declare "my @master_fields"?) at getdata.pl line 58.
syntax error at getdata.pl line 61, near "} else"

Could someone point me in the right direction?

Here is the code I have:

# open file
open my $lines, '<', 'data.csv' or die "Unable to open data.csv\n";
my @records = <$lines>;
close $lines or die "Unable to close data.csv\n";   # Close the input file

# iterate through each line
foreach my $line ( @records ) {

    if ( $csv->parse($line) ) {

        my @master_fields = $csv->fields();

        # if the string is already in the @found array, go to next line.
        if ( grep( /^$master_fields[0]$/, @found ) {
            next;
        }
        else {
            # else; add to the @found array
            push @found, $master_fields[0];
        }        
    }
    else {
        warn "Line/record could not be parsed: @yob_records\n";
    }
}

Upvotes: 1

Views: 135

Answers (1)

ikegami
ikegami

Reputation: 385655

if ( grep( /^$master_fields[0]$/, @found ){

should be

if ( grep( /^$master_fields[0]$/, @found ) ){

Since $master_fields[0] doesn't contain a regex pattern, you need to convert it into a regex pattern.

grep( /^$master_fields[0]$/, @found )

should be

grep( /^\Q$master_fields[0]\E$/, @found )

Since you want to a perfect match against $master_fields[0],

grep( /^\Q$master_fields[0]\E$/, @found )

should be

grep( /^\Q$master_fields[0]\E\z/, @found )

or better yet,

grep( $_ eq $master_fields[0], @found )

Finally, you're misusing the CSV parser —let it determine where a records ends by using getline instead of splitting on newlines— and you're being extremely inefficient —O(N2) instead of O(N)— by using an array instead of a hash.

my $csv = Text::CSV_XS->new({ binary => 1, auto_diag => 2 });  # Or Text::CSV

my $qfn = 'data.csv';
open(my $fh, '<', $qfn)
    or die("Unable to open \"$qfn\": $!\n");

my %found;
while ( my $row = $csv->getline($fh) ) {
    ++$found{ $row->[0] };
}

my @found = sort keys %found;

Upvotes: 4

Related Questions