Reputation: 191
I need to check if a particular value--a string with no whitespace--is in a file, at a specific position. The file contains various lines of data and each line comprises data separated by * and a whitespace. The value we look for is always the sixth value of each line. Ex. (otherval1* otherval2* otherval3* otherval4* otherval5* value-to-get* etc.) but that same value could be found at another position in a line. But we don't want to get that value. The file can be open by other users during this check, so it needs to be flock. What would be the best and fastest way to do it. I can think of two ways:
my $value = "qt7nxve";
my $completed = 0; #if the value is found in the file at the sixth position of a line, $completed will take the value of 1.
First way: gathering data in an array
open(INFO, "$outfile1") or die ("Couldn't open datafile");
flock(INFO, 2);
my @datadone = <INFO>;
close(INFO);
foreach my $line (@datadone) {
my @liner = split(/\* /, $line);
if ($liner[5] eq $value) { $completed = 1;}
}
Second way: using while and an array
open(INFO, "$outfile1") or &dienice("Couldn't open datafile");
flock(INFO, 2);
while (<INFO>) {
my @datadone = <INFO>;
foreach my $line (@datadone) {
my @liner = split(/\* /, $line);
if ($liner[5] eq $value) { $completed = 1; }
}
}
close(INFO);
The following code appears faster, but it can't be used, since it could find the value at another position than the sixth position in a line and $completed would incorrectly take the value of 1.
open(INFO, "$outfile1") or die ("Couldn't open datafile");
flock(INFO, 2);
while (<INFO>) {
if ( $_ =~ m/$value/) {
$completed = 1;
}
}
close(INFO);
So what is the best and fastest practice (including any other way)?
Upvotes: 0
Views: 108
Reputation: 24073
You asked for the best and fastest way, but you haven't specified what "best" means; nor have you given any reason to believe that you're not trying to perform premature optimization. Keeping those points in mind, here is a solution.
When dealing with delimited data, I like to use Text::CSV. You can use any single-byte character as a delimiter. If you also have Text::CSV_XS installed, you will get a performance boost. The following will print found
if the specified value is found in the sixth column anywhere in the file. Parsing stops as soon as a match is found.
Note that this will not work if any of your fields can contain *
characters (unless there is some sort of quoting mechanism like in the CSV spec).
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
use Text::CSV;
my $csv = Text::CSV->new({
sep_char => '*',
allow_whitespace => 1,
binary => 1,
auto_diag => 1
}) or die "Cannot use CSV: " . Text::CSV->error_diag();
open my $fh, '<', 'infile' or die $!;
my $found;
my $value = 'qt7nxve';
while (my $row = $csv->getline($fh)) {
if ($row->[5] eq $value) {
$found = 1;
last;
}
}
close $fh;
say 'found' if $found;
Upvotes: 1
Reputation: 29854
The fastest I can possibly think is to make the regex do your column-counting for you:
qr/ (?: # open non-capturing group
.*? # anything up until...
\* [ ] # a star and a space
){5} # five of these groups
value-to-get # the literal value you are looking for
\* [ ] # closed by a star and a space
/x; # <- allows eXpanded notation
Because you specify '* '
as the column delimiter, the column has to stop at '* '
Making the cautious match (.*?
) before this, means that it matches everything up to the delimiter. Thus you want to find 5 of these groupings and then the value you are looking for followed by the '* '
.
And if this matches, I would put the line like this:
last if $completed = m/(?:.*?\*[ ]){5}value-to-get\*[ ]/;
This approach assumes that there is not some elaborate escaping or quoting protocol that would allow a literal '* '
in the column.
Upvotes: 2
Reputation: 37146
Consider using last
to leave whatever loop construct you are using once a match is found:
my $completed = 0;
while (<INFO>) {
my @data = split /\* /, $_;
if ($data[5] eq $value) {
$completed = 1;
last;
}
}
Upvotes: 4