Reputation: 21

Perl regular expressions explanation

I was hoping to get a little explanation I have the following script:

open (FILE, '2.txt');
@DNA = <FILE>;
$DNA = join ('', @DNA);

print "DNA = ". $DNA . "\n";

$a=0;
while ($DNA =~ //ig) {$a++;}
print "Total characters = ".$a."\n";

$b=0;
while ($DNA =~ /fl/ig) {$b++;}
print "Total fl = ".$b."\n";

$c=0;
while ($DNA =~ /[^fl]/ig) {$c++;}
print "Total character less fl = ".$c."\n";

exit;

The text document "2.txt" contains the following characters:

flkkkklllkkfewnofnewofewfl

When I run the script I get the following outputs:

DNA = flkkkklllkkfewnofnewofewfl
Total characters = 27
Total fl = 2
Total character less fl = 16

My question is, why when I do
while ($DNA =~ /fl/ig) {$b++;} if counts all the instances of fl together,

but when I do
while ($DNA =~ /[^fl]/ig) {$c++;} it counts the number of characters that
are neither an f or and l (i.e. the f & the l are treated separately).

I was looking for the script to count the number of characters that are not fl (i.e. treated together)

Upvotes: 1

Answers (2)

user557597

Reputation:

[fl] is a character class, means f or l.
It doesn't mean the substring fl.

So [^fl] counts all the characters that are not f or l.

However, you could do that with a regex like this -

/[^fl]|f(?!l)|(?<!f)l/

Formatted:

    [^fl]          # Not f nor l
 |  f (?! l )      # f not followed by l
 |  (?<! f ) l     # l not following f

Upvotes: 2

ritter

Reputation: 597

Keeping it simple, maybe consider dropping all the instances of "fl" first, then simply counting the remaining characters:

$DNA =~ s/fl//g;
print "Total characters less fl = ".length($DNA)."\n";

Upvotes: 0

Perl regular expressions explanation

Answers (2)

Related Questions