Reputation: 1313
Trying to wrap my head around look-ahead and look-behind in regex processing.
Let's assume I have a file listing PIDs and other things. I want to build a regex to match the PID format \d{1,5} but that also excludes a certain PID.
$myself = $$;
@file = `cat $FILE`;
@pids = grep /\d{1,5}(?<!$myself)/, @file;
In this regex I try to combine the digits match with the exclusion using a negative look-behind by using the (?<!TO_EXCLUDE) construct. This doesn't work.
Sample file:
456
789
4567
345
22743
root
bin
sys
Would appreciate if someone could point me in the right direction.
Also would be interested to find out if this negative look-behind would be the most efficient in this scenario.
Upvotes: 3
Views: 1558
Reputation: 67908
How's about:
chomp(@file); # remove newlines that will otherwise mess things up
my @pids = grep /\d{1,5}/, @file;
my %pids = map { $_ => 1 }, @pids;
delete $pids{$$}; # delete one specific pid
@pids = keys %pids;
I.e. funnel the list of PIDs through a hash and delete the own PID. Needs to chomp
the lines read from file to match the PID.
I feel pretty sure there's a module on CPAN that handles processes though.
ETA:
If you are reading the values from readdir
as you mentioned in comments, something like this might be your best option (untested):
opendir my $dh, "/proc" or die $!;
my @pids;
while ( my $line = readdir $dh ) { # iterate through directory content
next unless $line =~ /^\d{1,5}$/; # skip non-numbers
next if $line == $$; # skip own PID
push @pids, $line;
}
Upvotes: 2
Reputation: 106375
I've upvoted the choroba solution, just wanted to explain why your original approach didn't work.
See, the regex parser is a complicated beast: it suffers from internal struggle of trying to match as many symbols as possible - and trying to match at any cost. And the latter, well, usually wins. )
For example, let's analyze the following:
my $test_line = '22743';
my $pid = '22743';
print 'Matched?', "\n" if $test_line =~ /\d{1,5}(?<!$pid)/;
print $&, "\n";
Why did it print 'Matched', you may ask? Because that's what happened: first the engine tried to consume all the five numbers, then match the next subexpression - and failed (that was the point of negative lookbehind, wasn't it?)
If it was you, you've stopped already - but not the engine! It still feels that dark desire to match no-matter-what! So it takes the next possible quantifier - four instead of five - and now, of course, the lookbehind subexpression is destined to succeed. ) That's quite easy to check by examining what's printed by print $&
;
Can it be solved yet within the realm of regular expressions? Yep, with so called atomics
:
print 'No match for ya!', "\n" unless $test_line =~ /(?>\d{1,5})(?<!$pid)/;
But that's usually considered a dark magic, I guess. )
Upvotes: 5
Reputation: 33908
And if you are curious how it could be done with regex here are some examples:
/\b\d{1,5}+(?<!\b$pid)/
/\b\d{1,5}\b(?<!\b$pid)/
/\b(?!$pid\b)\d+/
/^(?!$pid$)\d+$/
Upvotes: 4
Reputation: 46
A slightly different way (I try to avoid @file = cat text.txt
)
my @pids;
open my $fi, "<", "pids.txt";
while (<$fi>) {
if (/(\d{1,5})/) {
push @pids, $1 if $1 ne $$;
}
}
close $fi;
print join(", ", @pids), "\n";
This is my second post to SO, I hope it's ok offering an alternate method.
Upvotes: 0
Reputation: 241908
"Look behind" really looks behind. So, you can check whether a PID is preceded by something, not whether it matches something. If you just want to exclude $$, you can be more straightforward:
@file = `cat $FILE`;
@pids = grep /(\d{1,5})/ && $1 ne $$, @file;
Upvotes: 6