Reputation: 61
I"m writing to perl script where basically want to open a file having many strings(one string in one line) and compare each of these strings is present in another file(search file) and print each occurrence of it. I have written the below code for one particular string finding. How can i improve it for list of strings from a file.
open(DATA, "<filetosearch.txt") or die "Couldn't open file filetosearch.txt for reading: $!";
my $find = "word or string to find";
#open FILE, "<signatures.txt";
my @lines = <DATA>;
print "Lined that matched $find\n";
for (@lines) {
if ($_ =~ /$find/) {
print "$_\n";
}
}
Upvotes: 0
Views: 2209
Reputation: 117
I'd try something like this:
use strict;
use warnings;
use Tie::File;
tie my @lines, 'Tie::File', 'filetosearch.txt';
my @matched;
my @result;
tie my @patterns, 'Tie::File', 'patterns.txt';
foreach my $pattern (@patterns)
{
$pattern = quotemeta $pattern;
@matched = grep { /$pattern/ } @lines;
push @result, @matched;
}
Upvotes: 1
Reputation: 6204
Here's another option:
use strict;
use warnings;
my $searchFile = pop;
my @strings = map { chomp; "\Q$_\E" } <>;
my $regex = '(?:' . ( join '|', @strings ) . ')';
push @ARGV, $searchFile;
while (<>) {
print if /$regex/;
}
Usage: perl script.pl strings.txt searchFile.txt [>outFile.txt]
The last, optional parameter directs output to a file.
First, the search file's name is (implicitly) pop
ped off @ARGV
and saved for later. Then the strings' file is read (<>
) and map
is used to chomp
each line, escape meta-characters (the \Q
and \E
, in case there may be regex chars, e.g., a '.' or '*' etc., in the string) then these lines are passed to an array. The array's elements are join
ed with the regex alternation character (|
) to effectively form an OR
statement of all the strings that will be matched against each of the search file's lines. Next, the search file's name is push
ed onto @ARGV
so its lines can be searched. Again, each line is chomp
ed and print
ed if one of the strings are found on the line.
Hope this helps!
Upvotes: 0
Reputation: 595
maybe something like this will do the job:
open FILE1, "filetosearch.txt";
my @arrFileToSearch = <FILE1>;
close FILE1;
open FILE2, "signatures.txt";
my @arrSignatures = <FILE2>;
close FILE2;
for(my $i = 0; defined($arrFileToSearch[$i]);$i++){
foreach my $signature(@arrSignatures){
chomp($signature);
$signature = quotemeta($signature);#to be sure you are escaping special characters
if($arrFileToSearch[$i] =~ /$signature/){
print $arrFileToSearch[$i-3];#or any other index that you want
}
}
}
Upvotes: 0
Reputation: 5083
Ok, something like this will be faster.
sub testmatch
{
my ($find, $linesref)= @_ ;
for ( @$linesref ) { if ( $_ =~ /$find/ ) { return 1 ; } }
return 0 ;
}
{
open(DATA, "<filetosearch.txt") or die "die" ;
my @lines = <DATA> ;
open(SRC, "tests.txt") ;
while (<SRC>)
{
if ( testmatch( $_, \@lines )) { print "a match\n" }
}
}
If its matching full line to full line, you can pack the one line in as keys to a hash and just test existance:
{
open(DATA, "<filetosearch.txt") or die "die" ;
my %lines ;
@lines{<DATA>}= undef ;
open(SRC, "tests.txt") ;
while (<SRC>)
{
if ($_ ~~ %lines) { print "a match\n" }
}
}
Upvotes: 0