ameyask
ameyask

Reputation: 255

Perl Regex Help - find lines in another file

I want the match to occur but it's not happening.

I have an issue. I wish to find occurences of lines from one file in another.

Here's one file (@file)

735 1 1  
1891 1 0  
2021 1 1  
1892 2 1  
667 1 0  
802 2 1  
665 1 0  
666 1 1  
596 1 0  
3193 2 1  

Here's the one in which I have to find above lines (@file1)

1521 1 0 : 1167 0 0 : 1167 2 0 : 1167 1 0 ;  
2605 1 1 ;  
2280 0 1 : 2280 2 0 : 1892 0 0 : 2280 1 0 : 2021 0 0 ;  
1892 2 1 : 667 0 1 : 667 1 0 ;  
1892 1 1 ;  

Here's the code I wrote

foreach $leadline (@file1) {

  foreach $line (@file) {
    $_    = ' ' . $leadline;
    $line = ' ' . $line;
    if (m/$line/) {
      push @final, $_;
    }

  }

}

But I am unable to detect the lines.

@file1 and @file variables store the contents of the files.

I either get no lines detected or all lines detected.

The reason I am concatenating a Space before the two lines is , that sometimes 667 1 0 can occur as the very first phrase in a given line.I am not comfortable with Regex to do that in Regex directly.

Note :- If line i and line j in first file occur as a pattern in the same line of the other file then output should be just one of the lines. Also , if a pattern 1667 1 0 is found, it shouldn't be confused with 667 1 0. Hence I added the whitespace.

I was able to achieve this goal in Python but unable to replicate it in Perl . Here's the Python snippet :-

for line1 in file1:
    for j in range(0,len(file0)-1):# in file0:
        if ' '+lines[j][0:len(file0[j])-1] in ' '+line1:
            i = i + 1
            print line1[0:len(line1)-1]
            break  

Expected output is :- 1892 2 1 : 667 0 1 : 667 1 0 ;

Upvotes: 0

Views: 142

Answers (3)

Borodin
Borodin

Reputation: 126722

I now think this is a solution to a different problem, but here it is anyway!

use warnings;
use strict;
use 5.010;

use Array::Utils 'array_diff';

open my $fh, '<', 'f1.txt' or die $!;
my @f1;
while ( <$fh> ) {
  push @f1, [split];
}

my @final;
open $fh, '<', 'f2.txt' or die $!;
while ( <$fh> ) {
  my @f2 = map [ /\d+/g ], split /:/;
  for my $f1 ( @f1 ) {
    my @matches = grep { not array_diff(@$f1, @$_) } @f2;
    push @final, map "@$_", @matches;
  }
}

say for @final;

output

1892 2 1
667 0 1
667 1 0

Update

Okay here's my second attempt! This is essentially what choroba wrote but using map and with the addition of stripping all trailing whitespace on the data from the first file.

use warnings;
use strict;
use 5.014;  # For non-destructive substitution

open my $fh, '<', 'f1.txt' or die $!;
my @f1 = map s/\s+\z//r, <$fh>;
my $re = join '|', @f1;

open $fh, '<', 'f2.txt' or die $!;
my @final = grep /\b(?:$re)\b/, <$fh>;

print for @final;

output

1892 2 1 : 667 0 1 : 667 1 0 ;  

Upvotes: 1

Toto
Toto

Reputation: 91415

Here is how I'd do the job:

use Modern::Perl;
use Data::Dumper;$Data::Dumper::Indent = 1;

my @file = (
'735 1 1',
'1891 1 0',
'2021 1 1',
'1892 2 1',
'667 1 0',
'802 2 1',
'665 1 0',
'666 1 1',
'596 1 0',
'3193 2 1',
);
my @final;
while(my $line = <DATA>) {
    chomp $line;
    if (grep{$line =~ /\b$_\b/} @file) {
        push @final, $line;
    }
}
say Dumper\@final;

__DATA__
1521 1 0 : 1167 0 0 : 1167 2 0 : 1167 1 0 ;  
2605 1 1 ;  
2280 0 1 : 2280 2 0 : 1892 0 0 : 2280 1 0 : 2021 0 0 ;  
1892 2 1 : 667 0 1 : 667 1 0 ;  
1892 1 1 ;  

Output:

$VAR1 = [
  '1892 2 1 : 667 0 1 : 667 1 0 ;  '
];

With your files:

use Modern::Perl;
use Data::Dumper;$Data::Dumper::Indent = 1;

open my $fh, '<', 'file.txt' or die "unable to open 'file.txt': $!";
my @file = <$fh>;
chomp @file;

my @final;
open $fh, '<', 'file1.txt' or die "unable to open 'file1.txt': $!";
while(my $line = <$fh>) {
    chomp $line;
    if (grep{$line =~ /\b$_\b/} @file) {
        push @final, $line;
    }
}
say Dumper\@final;

Upvotes: 1

choroba
choroba

Reputation: 241858

You can create a regex by joining the lines from file1 by | (and applying quotemeta on each). \b should prevent matching 667 in 1667.

#!/usr/bin/perl
use warnings;
use strict;

my @search;
open my $F1, '<', 'file1' or die $!;
while (<$F1>) {
    chomp;
    push @search, quotemeta;
}

my $regex = join '|', @search;
$regex = qr/\b(?:$regex)\b/;

open my $F2, '<', 'file2' or die $!;
while (<$F2>) {
    print if /$regex/;
}

Upvotes: 1

Related Questions