guidebortoli
guidebortoli

Reputation: 679

Perl, matching files of a directory, using an array with part of the these file names

So, I have this directory with files named like this:

HG00119.mapped.ILLUMINA.bwa.GBR.low_coverage.20101123.bam.bai
HG00119.mapped.ILLUMINA.bwa.GBR.exome.20120522.bam_herc2_data.bam
HG00117.mapped.illumina.mosaik.GBR.exome.20110411.bam_herc2_phase1.bam
HG00117.mapped.illumina.mosaik.GBR.exome.20110411.bam.bai
NA20828.mapped.illumina.mosaik.TSI.exome.20110411.bam_herc2_phase1.bam
NA20828.mapped.ILLUMINA.bwa.TSI.low_coverage.20130415.bam_herc2_data.bam

And I have a input.txt file that contains in each line.

NA20828  
HG00119

As you can see, the input.txt file has the beginning of the name of the files inside the directory.

What I want to do is to filter the files in the directory that have the name (in this case just the beginning), inside the input.txt. I don't know if I was clear, but here is the code I've done so far.

use strict;
use warnings;

my @lines;                              
my @files = glob("*.mapped*");

open (my $input,'<','input.txt') or die $!;         
while (my $line = <$input>) {
    push (@lines, $line);               
}
close $input;

I used the glob to filter only the files with mapped in the name, since I have other files there that I don't want to look for.

I tried some foreach loops, tried grep and regex also, and I'm pretty sure that I was going in the right way, and I think my mistake might be about scope.

I would appreciate any help guys! thanks!

Upvotes: 0

Views: 282

Answers (2)

Borodin
Borodin

Reputation: 126722

You can build a regular expression from the contents of input.txt like this

my @lines = do {    
    open my $fh, '<', 'input.txt' or die $!;         
    <$fh>;
};
chomp @lines;
my $re = join '|', @lines;

and then find the required files using

my @files = grep /^(?:$re)/, glob '*.mapped*';

Note that, if the list in input.txt contains any regex metacharacters, such as ., *, + etc. you will need to escape them, probably using quotemeta like this

my $re = join '|', map quotemeta, @lines;

and it may be best to do this anyway unless you are certain that there will never ever be such characters in the file.

Upvotes: 1

Sobrique
Sobrique

Reputation: 53478

OK, first off - your while loop is redundant. If you read from a filehandle in a list context, it reads the whole thing.

my @lines = <$input>; 

will do the same as your while loop.

Now, for your patterns - you're matching one list against another list, but partial matches.

chomp ( @lines );
foreach my $file ( @files ) {
    foreach my $line ( @lines ) {
        if ( $file =~ m/$line/ ) { print "$file matches $line"; }
    }
}

(And yes, something like grep or map can do this, but I always find those two make my head hurt - they're neater, but they're implicitly looping so you don't really gain much algorithmic efficiency).

Upvotes: 1

Related Questions