Tomáš Lučenič
Tomáš Lučenič

Reputation: 49

Perl regex: Substitution of everything but the pattern

In perl, I would like to substitute a negated class character set (everything but the pattern) by nothing, to keep only the expected string. Normally, this approach should work, but in my case it isn't :

$var =~ s/[^PATTERN]//g;

the original string:

$string = '<iframe src="https://foo.bar/embed/b74ed855-63c9-4795-b5d5-c79dd413d613?autoplay=1&context=cGF0aD0yMSwx</iframe>'; 

wished pattern to get: b74ed855-63c9-4795-b5d5-c79dd413d613

(5 hex number groups split with 4 dashes)

my code:

$pattern2keep = "[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}";  

(should match only : xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx (5 hex number groups split with 4 dashes) , char length : 8-4-4-4-12 )

The following should substitute everything but the pattern by nothing, but in fact it does not.

$string =~ s/[^$pattern2keep]//g;

What am I doing wrong please? Thanks.

Upvotes: 1

Views: 1502

Answers (1)

Borodin
Borodin

Reputation: 126722

A character class matches a single character equal to any one of the characters in the class. If the class begins with a caret then the class is negated, so it matches any one character that isn't any of the characters in the class

If $pattern2keep is [0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12} then [^$pattern2keep] will match any character other than -, 0, 1, 2, 4, 8, 9, [, ], a, f, {, or }

You need to capture the substring, like this

use strict;
use warnings 'all';
use feature 'say';

my $string = '<iframe src="https://foo.bar/embed/b74ed855-63c9-4795-b5d5-c79dd413d613?autoplay=1&context=cGF0aD0yMSwx</iframe>';

my $pattern_to_keep = qr/ \p{hex}{8} (?: - \p{hex}{4} ){3} - \p{hex}{12} /x;

my $kept;

$kept = $1 if $string =~ /($pattern_to_keep)/;

say $kept // 'undef';

output

b74ed855-63c9-4795-b5d5-c79dd413d613

Upvotes: 8

Related Questions