Sharath
Sharath

Reputation: 61

search a group of string in a file is present in another file or not

I"m writing to perl script where basically want to open a file having many strings(one string in one line) and compare each of these strings is present in another file(search file) and print each occurrence of it. I have written the below code for one particular string finding. How can i improve it for list of strings from a file.

open(DATA, "<filetosearch.txt") or die "Couldn't open file filetosearch.txt for reading: $!";
my $find = "word or string to find";
#open FILE, "<signatures.txt";
my @lines = <DATA>;
print "Lined that matched $find\n";
for (@lines) {
    if ($_ =~ /$find/) {
        print "$_\n";
    }
}

Upvotes: 0

Views: 2209

Answers (4)

Ted Bear
Ted Bear

Reputation: 117

I'd try something like this:

use strict;
use warnings;
use Tie::File;

tie my @lines, 'Tie::File', 'filetosearch.txt';
my @matched;
my @result;
tie my @patterns, 'Tie::File', 'patterns.txt';
foreach my $pattern (@patterns)
{
    $pattern = quotemeta $pattern;
    @matched = grep { /$pattern/ } @lines;
    push @result, @matched;
}
  • I use Tie::File, because it is convenient (not especially in this case, but others), others (perhaps a lot of others?) would disagree, but it is of no importance here
  • grep is a core function, that is very good at what it does (In my experience)

Upvotes: 1

Kenosis
Kenosis

Reputation: 6204

Here's another option:

use strict;
use warnings;

my $searchFile = pop;
my @strings    = map { chomp; "\Q$_\E" } <>;
my $regex      = '(?:' . ( join '|', @strings ) . ')';

push @ARGV, $searchFile;

while (<>) {
    print if /$regex/;
}

Usage: perl script.pl strings.txt searchFile.txt [>outFile.txt]

The last, optional parameter directs output to a file.

First, the search file's name is (implicitly) popped off @ARGV and saved for later. Then the strings' file is read (<>) and map is used to chomp each line, escape meta-characters (the \Q and \E, in case there may be regex chars, e.g., a '.' or '*' etc., in the string) then these lines are passed to an array. The array's elements are joined with the regex alternation character (|) to effectively form an OR statement of all the strings that will be matched against each of the search file's lines. Next, the search file's name is pushed onto @ARGV so its lines can be searched. Again, each line is chomped and printed if one of the strings are found on the line.

Hope this helps!

Upvotes: 0

Heto
Heto

Reputation: 595

maybe something like this will do the job:

open FILE1, "filetosearch.txt";
my @arrFileToSearch = <FILE1>;
close FILE1;

open FILE2, "signatures.txt";
my @arrSignatures = <FILE2>;
close FILE2;

for(my $i = 0; defined($arrFileToSearch[$i]);$i++){
    foreach my $signature(@arrSignatures){
        chomp($signature);
        $signature = quotemeta($signature);#to be sure you are escaping special characters
        if($arrFileToSearch[$i] =~ /$signature/){
            print $arrFileToSearch[$i-3];#or any other index that you want
        }
    }

}

Upvotes: 0

woolstar
woolstar

Reputation: 5083

Ok, something like this will be faster.

sub testmatch
{
  my ($find, $linesref)= @_ ;

  for ( @$linesref ) { if ( $_ =~ /$find/ ) { return 1 ; } }
  return 0 ;
}

{
  open(DATA, "<filetosearch.txt") or die "die" ;
  my @lines = <DATA> ;

  open(SRC, "tests.txt") ;
  while (<SRC>)
  {
    if ( testmatch( $_, \@lines )) { print "a match\n" }
  }
}

If its matching full line to full line, you can pack the one line in as keys to a hash and just test existance:

{
  open(DATA, "<filetosearch.txt") or die "die" ;
  my %lines ;
  @lines{<DATA>}= undef ;

  open(SRC, "tests.txt") ;
  while (<SRC>)
  {
     if ($_ ~~ %lines) { print "a match\n" }
  }
}

Upvotes: 0

Related Questions