Reputation: 1
I would like to create an array in Perl of strings that I need to search/grep from a tab-deliminated text file. For example, I create the array:
#!/usr/bin/perl -w
use strict;
use warnings;
# array of search terms
my @searchArray = ('10060\t', '10841\t', '11164\t');
I want to have a foreach
loop to grep
a text file with a format like this:
c18 10706 463029 K
c2 10841 91075 G
c36 11164 . B
c19 11257 41553 C
for each of the elements of the above array. In the end, I want to have a NEW text file that would look like this (continuing this example):
c2 10841 91075 G
c36 11164 . B
How do I go about doing this? Also, this needs to be able to work on a text file with ~5 million lines, so memory cannot be wasted (I do have 32GB of memory though).
Thanks for any help/advice in advanced! Cheers.
Upvotes: 0
Views: 167
Reputation: 1
So I'm not the best coder but this should work.
#!/usr/bin/perl -w
use strict;
use warnings;
# array of search terms
my $searchfile = 'file.txt';
my $outfile = 'outfile.txt';
my @searchArray = ('10060', '10841', '11164');
my @findArray;
open(READ,'<',$searchfile) || die $!;
while (<READ>)
{
foreach my $searchArray (@searchArray) {
if (/$searchArray/) {
chomp ($_);
push (@findArray, $_) ;
}
}
}
close(READ);
### For Console Print
#foreach (@findArray){
# print $_."\n";
#}
open(WRITE,'>',$outfile) || die $!;
foreach (@findArray){
print WRITE $_."\n";
}
close(WRITE);
Upvotes: 0
Reputation: 1439
You can search for alternatives by using a regexp like /(10060\t|100841\t|11164\t)/
. Since your array could be large, you could create this regexp, by something like
$searchRegex = '(' + join('|',@searchArray) + ')';
this is just a simple string, and so it would be better (faster) to compile it to a regexp by
$searchRegex = qr/$searchRegex/;
With only 5 million lines, you could actually pull the entire file into memory (less than a gigabyte if 100 chars/line), but otherwise, line by line you could search with this pattern as in
while (<>) {
print if $_ =~ $searchRegex
}
Upvotes: 1
Reputation: 35208
Using a perl one-liner. Just translate your list of numbers into a regex.
perl -ne 'print if /\b(?:10060|10841|11164)\b/' file.txt > newfile.txt
Upvotes: 2