jinanwow
jinanwow

Reputation: 509

How do I efficiently filter lines of input against an array of data?

I am trying to read a file into a temporary variable, filtering the file based off of items in an array. I am doing this by opening a file and in the while loop of reading the file, run another loop (very bad idea IMO) to check to see if the contents match the array, if so the line is discarded and it proceeds to the next line.

It works, but its bad when there are 20,000 lines of input. I am reading with an array of 10 items, which essentially turns it into a 200,000 line file.

Is there a way to process this quicker?

Upvotes: 2

Views: 529

Answers (2)

toolic
toolic

Reputation: 62037

Assuming you want to discard a line if any item in your array is found, the any function from List::MoreUtils will stop searching through an array as soon as it has found a match.

use List::MoreUtils qw(any);

while (<>) {
    my $line = $_;
    next if any { $line =~ /$_/ } @list;
    # do your processing
}

If you happen to know which items in your array are more likely to occur in your lines, you could sort your array accordingly.

You should also Benchmark your approaches to make sure your optimization efforts are worth it.

Upvotes: 2

Marcelo Cantos
Marcelo Cantos

Reputation: 185852

Mash the array items together into a big regex: e.g., if your array is qw{red white green}, use /(red|white|green)/. The $1 variable will tell you which one matched. If you need exact matching, anchor the end-points: /^(red|white|green)$/.

Upvotes: 2

Related Questions