wreggyl
wreggyl

Reputation: 103

matching german umlaut with regexp correctly twice

i have a small script, which match via regexp if a string contains german umlauts like äöüß. In the first regexp match everything is working fine, but if i check the same string again it does not longer match correctly. The file itself is encoded as utf8 and i am also including the utf8 module.

this is the script:

#!/usr/bin/perl
use strict;
use warnings FATAL => 'all';
use utf8;
use Log4Perl::logger_helper qw( init_logger get_logger_and_trace );

my $strings = ["ä", "ae","ö", "oe", "ü", "ue", "ß", "ss"];

my $logger = init_logger(
    log_file_path => $0 . '.log'
);    # init_logger variables are all optional

foreach my $string (@$strings) {
    for(1..5) {
        if ( $string =~ /[\x{00C4}\x{00E4}\x{00D6}\x{00F6}\x{00DC}\x{00FC}\x{00DF}]/gi ) {
            $logger->info("umlauts match $string");
        }
        else {
            $logger->info("no umlauts $string");
        }
    }
 }

and this is the output:

umlauts match ä
no umlauts ä
umlauts match ä
no umlauts ä
umlauts match ä
no umlauts ae
no umlauts ae
no umlauts ae
no umlauts ae
no umlauts ae
umlauts match ö
no umlauts ö
umlauts match ö
no umlauts ö
umlauts match ö
no umlauts oe
no umlauts oe
no umlauts oe
no umlauts oe
no umlauts oe
umlauts match ü
no umlauts ü
umlauts match ü
no umlauts ü
umlauts match ü
no umlauts ue
no umlauts ue
no umlauts ue
no umlauts ue
no umlauts ue
umlauts match ß
no umlauts ß
umlauts match ß
no umlauts ß
umlauts match ß
no umlauts ss
no umlauts ss
no umlauts ss
no umlauts ss
no umlauts ss

Process finished with exit code 0

I tested it on different OS with different version of strawberry perl, also latest version (strawberry-perl-5.30.0.1-64bit-portable) is shown this error for me.

Any idea why it is matching alterating correctly? If i do the same with multiple index operations it is working.

Thanks in advance.

Upvotes: 3

Views: 754

Answers (2)

D. Bachran
D. Bachran

Reputation: 31

As @daxim explained, the global flag /g causes havoc here.

From Regexp Quote-Like Operators, important section highlighted in bold:

In scalar context, each execution of m//g finds the next match, returning true if it matches, and false if there is no further match. The position after the last match can be read or set using the pos() function. A failed match normally resets the search position to the beginning of the string, but you can avoid that by adding the /c modifier (for example, m//gc). Modifying the target string also resets the search position.

Since you repeatedly search within the same $string (without modifying it in between), each second search continues after the last successful match, resulting in a failure and resetting the search position for the next search.

See also "Global matching" in Using regular expressions in Perl:

The modifier /g stands for global matching and allows the matching operator to match within a string as many times as possible. In scalar context, successive invocations against a string will have /g jump from match to match, keeping track of position in the string as it goes along. You can get or set the position with the pos() function.

ciao, daniel :-)

Upvotes: 3

daxim
daxim

Reputation: 39158

The problem is the global flag. Remove it.

Upvotes: 1

Related Questions