Reputation: 191

Check if a value is in a file: best and fastest practice?

I need to check if a particular value--a string with no whitespace--is in a file, at a specific position. The file contains various lines of data and each line comprises data separated by * and a whitespace. The value we look for is always the sixth value of each line. Ex. (otherval1* otherval2* otherval3* otherval4* otherval5* value-to-get* etc.) but that same value could be found at another position in a line. But we don't want to get that value. The file can be open by other users during this check, so it needs to be flock. What would be the best and fastest way to do it. I can think of two ways:

my $value = "qt7nxve";
my $completed = 0; #if the value is found in the file at the sixth position of a line, $completed will take the value of 1.

First way: gathering data in an array

open(INFO, "$outfile1") or die ("Couldn't open datafile");
flock(INFO, 2);
my @datadone = <INFO>;
close(INFO);

foreach my $line (@datadone) {
my @liner = split(/\* /, $line);
if ($liner[5] eq $value) { $completed = 1;}
}

Second way: using while and an array

open(INFO, "$outfile1") or &dienice("Couldn't open datafile");
flock(INFO, 2);
while (<INFO>) {
my @datadone = <INFO>;
foreach my $line (@datadone) {
    my @liner = split(/\* /, $line);
    if ($liner[5] eq $value) { $completed = 1; }
        }
    }
close(INFO);

The following code appears faster, but it can't be used, since it could find the value at another position than the sixth position in a line and $completed would incorrectly take the value of 1.

open(INFO, "$outfile1") or die ("Couldn't open datafile");
flock(INFO, 2);
while (<INFO>) {
    if ( $_ =~ m/$value/) {
    $completed = 1;
            }
    }
close(INFO);

So what is the best and fastest practice (including any other way)?

Upvotes: 0

Answers (3)

ThisSuitIsBlackNot

Reputation: 24073

You asked for the best and fastest way, but you haven't specified what "best" means; nor have you given any reason to believe that you're not trying to perform premature optimization. Keeping those points in mind, here is a solution.

When dealing with delimited data, I like to use Text::CSV. You can use any single-byte character as a delimiter. If you also have Text::CSV_XS installed, you will get a performance boost. The following will print found if the specified value is found in the sixth column anywhere in the file. Parsing stops as soon as a match is found.

Note that this will not work if any of your fields can contain * characters (unless there is some sort of quoting mechanism like in the CSV spec).

#!/usr/bin/perl

use strict;
use warnings;
use feature 'say';

use Text::CSV;

my $csv = Text::CSV->new({
    sep_char => '*',
    allow_whitespace => 1,
    binary => 1,
    auto_diag => 1
}) or die "Cannot use CSV: " . Text::CSV->error_diag();

open my $fh, '<', 'infile' or die $!;

my $found;
my $value = 'qt7nxve';

while (my $row = $csv->getline($fh)) {
    if ($row->[5] eq $value) {
        $found = 1;
        last;
    }
}

close $fh;

say 'found' if $found;

Upvotes: 1

Axeman

Reputation: 29854

The fastest I can possibly think is to make the regex do your column-counting for you:

qr/ (?:           # open non-capturing group
        .*?       # anything up until...
        \* [ ]    # a star and a space
     ){5}         # five of these groups
     value-to-get # the literal value you are looking for
     \* [ ]       # closed by a star and a space
  /x; # <- allows eXpanded notation

Because you specify '* ' as the column delimiter, the column has to stop at '* ' Making the cautious match (.*?) before this, means that it matches everything up to the delimiter. Thus you want to find 5 of these groupings and then the value you are looking for followed by the '* '.

And if this matches, I would put the line like this:

last if $completed = m/(?:.*?\*[ ]){5}value-to-get\*[ ]/;

This approach assumes that there is not some elaborate escaping or quoting protocol that would allow a literal '* ' in the column.

Upvotes: 2

Zaid

Reputation: 37146

Consider using last to leave whatever loop construct you are using once a match is found:

my $completed = 0;
while (<INFO>) {
    my @data = split /\* /, $_;

    if ($data[5] eq $value) {
        $completed = 1;
        last;
    }
}

Upvotes: 4

Check if a value is in a file: best and fastest practice?

Answers (3)

Related Questions