RickyG
RickyG

Reputation: 1

Perl text file grep

I would like to create an array in Perl of strings that I need to search/grep from a tab-deliminated text file. For example, I create the array:

#!/usr/bin/perl -w

use strict;
use warnings;

# array of search terms
my @searchArray = ('10060\t', '10841\t', '11164\t');

I want to have a foreach loop to grep a text file with a format like this:

c18                 10706      463029             K
c2                  10841      91075              G
c36                 11164      .                  B
c19                 11257      41553              C

for each of the elements of the above array. In the end, I want to have a NEW text file that would look like this (continuing this example):

c2                  10841      91075              G
c36                 11164      .                  B

How do I go about doing this? Also, this needs to be able to work on a text file with ~5 million lines, so memory cannot be wasted (I do have 32GB of memory though).

Thanks for any help/advice in advanced! Cheers.

Upvotes: 0

Views: 167

Answers (3)

peabody
peabody

Reputation: 1

So I'm not the best coder but this should work.

#!/usr/bin/perl -w

use strict;
use warnings;

# array of search terms
my $searchfile = 'file.txt';
my $outfile = 'outfile.txt';
my @searchArray = ('10060', '10841', '11164');
my @findArray;

open(READ,'<',$searchfile) || die $!;
while (<READ>)
{
    foreach my $searchArray (@searchArray) {
        if (/$searchArray/) {
            chomp ($_);
            push (@findArray, $_) ;
        }
    }
}
close(READ);

### For Console Print
#foreach (@findArray){
#   print $_."\n";
#}

open(WRITE,'>',$outfile) || die $!;
foreach (@findArray){
    print WRITE $_."\n";
}
close(WRITE);

Upvotes: 0

Roobie Nuby
Roobie Nuby

Reputation: 1439

You can search for alternatives by using a regexp like /(10060\t|100841\t|11164\t)/. Since your array could be large, you could create this regexp, by something like

$searchRegex = '(' + join('|',@searchArray) + ')';

this is just a simple string, and so it would be better (faster) to compile it to a regexp by

$searchRegex = qr/$searchRegex/;

With only 5 million lines, you could actually pull the entire file into memory (less than a gigabyte if 100 chars/line), but otherwise, line by line you could search with this pattern as in

while (<>) {
    print if $_ =~ $searchRegex
}

Upvotes: 1

Miller
Miller

Reputation: 35208

Using a perl one-liner. Just translate your list of numbers into a regex.

perl -ne 'print if /\b(?:10060|10841|11164)\b/' file.txt > newfile.txt

Upvotes: 2

Related Questions