Yash
Yash

Reputation: 3114

Read a file line by line for an exact match using perl script

I have written a perl script to read an input file line-by-line for a given search string. I have done two implementations using the inbuilt perl functions grep and index, but I'm not able to get the output for the exact string match.

My sample code, input file and the desired output is shown below. Please help me understand the issue with this script which can help me to get the required output.

SAMPLE_CODE

#!/usr/bin/perl

my $myfile = "/path/to/the/file/list.txt";
my $details = "1234,5678";
my @required;

open FH, "$myfile" or die "Cannot open file for reading\n";
while(<FH>)
{
    $line = $_;
    chomp $line;
    @list = split(/\,/, $details);

    foreach my $var (@list)
    {
        chomp($var);
        #if (grep /$var/, $line)            # partially working
        if (index($line, $var) >= 0)        # partially working
        {
            my @arr = split(/[\:]/, $line);
            push (@required, $arr[0]);
        }
    }
}
close FH;

print "required array is @required \n"; 

INPUT_FILE

$>  cat /path/to/the/file/list.txt

CAT:1234,5678
RAT:12345,9871

OUTPUT

required array is CAT CAT RAT 

DESIRED_OUTPUT

required array is CAT CAT

Here the problem is that, Since the variable $details has a string 1234, the grep or index checks should not pass the search criteria for the second line in input file list.txt for 12345.

How can i fix this issue for exact match?

Upvotes: 3

Views: 3354

Answers (3)

Aditya
Aditya

Reputation: 364

You need to print the 1st field of every line if its 2nd, 3rd etc. fields match certain criteria?

echo 'CAT:1234,5678\nRAT:12345,9871' |
perl -F'/:|,/' -lane 'foreach (@F) { push @required, $F[0] if /\b1234\b|\b5678\b/ }
                      print "The required array: @required"'

Output:

The required array: CAT CAT

-F'/:|,/' option tells Perl to split the fields of a line on either : or , and fill the special array @F with these fields: $F[0] gets the 1st field, $F[1] gets the 2nd and so on.

If any field of the line: foreach (@F) match 1234 or 5678: if /\b1234\b|\b5678\b/, then push the 1st field of the line onto the @required array: push @required, $F[0].

To read the data from a file:

perl -F'/:|,/' -lane 'foreach (@F) { push @required, $F[0] if /\b1234\b|\b5678\b/ } 
                      END{ print "The required array: @required" }' yourData.txt

Upvotes: -1

Polar Bear
Polar Bear

Reputation: 6798

It already was indicated that your code will match partial pattern, it is not what you desire. You need implement exact match and regular expression has \b to indicate the boundary of element.

It is a good practice to include in the begin of the script

use strict;
use warnings;

what allows to warn you about undesired effect of your code.

Perhaps for this case you can utilize <> (null handle/diamond operator) instead of opening filehandle, it simplifies the code and allows code's double usage as script.pl list.txt or cat list.txt | script.pl

NOTE: @list = split(/,/, $details); should be placed outside of loop to save CPU cycles

Please see the following snippet code producing desired output

#!/usr/bin/env perl
#
# vim: ai ts=4 sw=4

use strict;
use warnings;
use feature 'say';

my $details = "1234,5678";
my(@list, @required);

@list = split(/,/, $details);

while(<>) {
    for my $element ( @list ) {
        if( /\b$element\b/ ) {
            my @arr = split(/[:,]/, $_);
            push @required, $arr[0];
        }
    }
}

say "Required array is @required";

Output

Required array is CAT CAT

Reference: <>, $_, split

Upvotes: 0

TLP
TLP

Reputation: 67900

Your problem is that the matching you are doing, both grep /$var/ and index($line, $var) allows for lines to match partially. I.e.

12345
^^^^  <---- matches 1234

Much the same way that /car/ would match carpet, or scarlet partially.

What you should probably do is isolate the numbers, put them in an array, and check it numerically. For example:

my ($name, @nums) = split /[:,]/, $line;     # split into all fields at once
for my $num (@nums) {
    for my $num2 (@list) {
        if ($num == $num2) {                 # check numerical equality
             push @required, $name;
        }
    }
}

Or if as your comment implies, your fields are strings, you can use eq to check for equality. Alternatively use anchors in your regex /^$var$/ to force a complete match. ^ means beginning of line, and $ end of line. For example:

"car" eq "carpet"     # false
"car" eq "car"        # true
"carpet" =~ /^car$/   # false

More efficiently, you can use a hash for the numbers to test for, for example

my %list = map { $_ => 1 } split /,/, $details;
...
if ($list{$num}) {        # check if the value is true
    push @required, $name;
}

Upvotes: 2

Related Questions