MattLBeck
MattLBeck

Reputation: 5831

Regex as a command line arg for filtering lines with a particular value

I want to be able to take an argument from the command line and use it as a regular expression within my script to filter lines from my file. A simple example

$ perl script.pl id_4

In script.pl:

...
my $exp = shift;
while(my $line = <$fh){
    if($line =~ /$exp/){
        print $line,"\n";
    }
}
...

My actual script is a bit more complicated and does other manipulations to the line to extract information and produce a different output. My problem is that I have situations where I want to filter out every line that contains "id_4" instead of only select lines containing "id_4". Normally this could be achieved by

if($line !~ /$exp/)

but, if possible, I don't want to alter my script to accept a more complex set of arguments (e.g. use !~ if second parameter is "ne", and =~ if not).

Can anyone think of a regex that I can use (beside a long "id_1|id_2|id_3|id_5...") to filter out lines containing one particular value out of many possibilities? I fear I'm asking for the daft here, and should probably just stick to the sensible and accept a further argument :/.

Upvotes: 0

Views: 661

Answers (2)

TLP
TLP

Reputation: 67918

Why choose? Have both.

my $exp = join "|", grep !/^!/, @ARGV;
my @not = grep /^!/,  @ARGV;
s/^!// for @not;
my $exp_not = join "|", @not;

...
if (( $line =~ $exp ) && ( $line !~ $exp_not )) {
    # do stuff
}

Usage:

perl script.pl orange soda !light !diet

Upvotes: 1

Kaz
Kaz

Reputation: 58598

There is a way to invert regular expressions, so you can do matches like "all strings which do not contain a match for subexpr". Without the operators which express this directly (i.e. using only the basic positive-matching regex operators), it is still possible but leads to large and unwieldy regular expressions (possibly, combinatorial explosion in the regex size).

For a simple example, look at my answer to this question: how to write a regex which matches everything but the string "help". (It's a quite a simplification that the match is anchored to start and end.) Match all letter/number combos but specific word?

Traditional Unix tools have hacks for situations when you want to just invert the match of the expression as a whole: grep versus grep -v. Or vi: :g/pat/ versus :v/pat/, etc. In this way, the implementors ducked out implementing the difficult regex operators that don't fit into the simple NFA construction approach.

The easiest thing is to do the same thing and have a convention for coarse-grained negation: an include pattern and an exclude pattern.

Upvotes: 0

Related Questions