user3773222
user3773222

Reputation: 41

perl parsing files for multiple strings

I have been learning perl for the past two weeks. I have been writing some perl scripts for my school project. I need to parse a text file for multiple strings. I searched perl forums and got some information.The below function parses a text file for one string and returns a result. However I need the script to search the file for multiple strings.

use strict;
use warnings;


sub find_string {
    my ($file, $string) = @_;
    open my $fh, '<', $file;
    while (<$fh>) {
        return 1 if /\Q$string/;
    }
    die "Unable to find string: $string";
}

find_string('filename', 'string');

Now for instance if the file contains multiple strings with regular expressions as listed below

"testing"
http://www.yahoo.com =1
http://www.google.com=2

I want the function to search for multiple strings like

find_string('filename', 'string1','string2','string3');

Please can somebody explain me how i need to do that.It would be really helpful

Upvotes: 3

Views: 1543

Answers (3)

David W.
David W.

Reputation: 107040

Going through this very quickly here:

You right now pass the name of a file, and one string. What if you pass multiple strings:

 if ( find_string ( $file, @strings ) ) {
    print "Found a string!\n";
}
else {
    print "No string found\n";
}


..

sub find_string {
    my $file    = shift;
    my @strings = @_;
    #
    # Let's make the strings into a regular expression
    #
    my $reg_exp = join "|" ,@strings;   # Regex is $string1|$string2|$string3...

    open my $fh, "<", $file or die qq(Can't open file...);
    while ( my $line = <$fh> ) {
       chomp $line;
       if ( $line =~ $reg_exp ) {
           return 1;     # Found the string
       }
    }
    return 0;            # String not found
}

I am about to go into a meeting, so I haven't really even tested this, but the idea is there. A few things:

  • You want to handle characters in your strings that could be regular expression characters. You can use either the quotemeta command, or use \Q and \E before and after each string.
  • Think about using use autodie to handle files that can't be open. Then, you don't have to check your open statement (like I did above).
  • There are limitations. This would be awful if you were searching for 1,000 different strings, but should be okay with a few.
  • Note how I use a scalar file handle ($fh). Instead of opening your file via the subroutine, I would pass in a scalar file handle. This would allow you to take care of an invalid file issue in your main program. That's the big advantage of scalar file handles: They can be easily passed to subroutines and stored in class objects.

Tested Program

#! /usr/bin/env perl
#

use strict;
use warnings;
use autodie;
use feature qw(say);

use constant {
    INPUT_FILE =>       'test.txt',
};


open my $fh, "<", INPUT_FILE;

my @strings = qw(foo fo+*o bar fubar);

if ( find_string ( $fh, @strings ) ) {
    print "Found a string!\n";
}
else {
    print "No string found\n";
}

sub find_string {
    my $fh    = shift;          # The file handle
    my @strings = @_;           # A list of strings to look for

    #
    # We need to go through each string to make sure there's
    # no special re characters
    for my $string ( @strings ) {
        $string = quotemeta $string;
    }

    #
    # Let's join the stings into one big regular expression
    #
    my $reg_exp = join '|', @strings;   # Regex is $string1|$string2|$string3...
    $reg_exp = qr($reg_exp);            # This is now a regular expression

    while ( my $line = <$fh> ) {
        chomp $line;
        if ( $line =~ $reg_exp ) {
            return 1;     # Found the string
        }
    }
    return 0;            # String not found
}
  • autodie handles issues when I can't open a file. No need to check for it.
  • Notice I have three parameters in my open. This is the preferred way.
  • My file handle is $fh which allows me to pass it to my find_string subroutine. Open the file in the main program, and I can handle read errors there.
  • I loop through my @strings and use the quotemeta command to automatically escape special regular expression characters.
  • Note that when I change $string in my loop, it actually modifies the @strings array.
  • I use qr to create a regular expression.
  • My regular expression is /foo|fo\+\*o|bar|fubar/.
  • There are a few bugs For example, the string fooburberry will match with foo. Do you want that, or do you want your strings to be whole words?

Upvotes: 2

cartman
cartman

Reputation: 762

I think you can store the file content in an array first, then grep the input in the array.

use strict;
use warnings;

sub find_multi_string {
    my ($file, @strings) = @_; 
    my $fh;
    open ($fh, "<$file");
    #store the whole file in an array
    my @array = <$fh>;

    for my $string (@strings) {
        if (grep /$string/, @array) {
            next;
        } else {
            die "Cannot find $string in $file";
        }   
    }   

    return 1;
}

Upvotes: 0

hmatt1
hmatt1

Reputation: 5139

I'm happy to see use strict and use warnings in your script. Here is one basic way to do it.

use strict;
use warnings;


sub find_string {

    my ($file, $string1, $string2, $string3) = @_;

    my $found1 = 0;
    my $found2 = 0;
    my $found3 = 0;

    open my $fh, '<', $file;
    while (<$fh>) {
        if ( /$string1/ ) {
            $found1 = 1;
        }
        if ( /$string2/ ) {
            $found2 = 1;
        }
        if ( /$string3/ ) {
            $found3 = 1;
        }
    }

    if ( $found1 == 1 and $found2 == 1 and $found3 == 1 ) {
        return 1;
    } else {
        return 0;
    }
}

my $result = find_string('filename', 'string1'. 'string2', 'string3');

if ( $result == 1 ) {
    print "Found all three strings\n";
} else {
    print "Didn't find all three\n";
}

Upvotes: 0

Related Questions