kpda806G
kpda806G

Reputation: 103

regex filter only uppercase consonants

I'am taking a online training regex course. The question is

With regex you can count the number of matches. Can you make it return the number of uppercase consonants (B,C,D,F,..,X,Y,Z) in a given string? E.g.: it should return 3 with the text ABcDeFO!. Note: Only ASCII. We consider Y to be a consonant! Example: the regex /./g will return 3 when run against the string abc.`

And my solution is /[BCDFGHJKLMNPQRSTVWXYZ]/g (26 characters long)

another my solution is /(?![AEIOU])[A-Z]/g which is 19 characters long. but according to the online statistics the shortest solution is 16 characters long. Any ideas how to do that?


screenshot

Upvotes: 7

Views: 4003

Answers (3)

Vladislav
Vladislav

Reputation: 394

My result is based on @pushpesh-kumar-rajwanshi answer. First idea was to use [^\0-AEIOU[-ÿ]. The same result gives [^\1-AEIOU[-ÿ].

But it is 17 characters long. I suspected that \0 or \1 can be written somehow shorter.

Notepad++ can "print" control characters. \0 - does not print any character, but \1 - prints SOH-symbol. Copy-paste it into regex101.com and you will get a `` symbol, which looks like U+0001

So the solution is: [^U+0001-AEIOU[-ÿ] - 16 character long.

Update: unfortunately, stack-overflow ignores this symbol, it is better to show in screenshot: enter image description here

Upvotes: 0

Pushpesh Kumar Rajwanshi
Pushpesh Kumar Rajwanshi

Reputation: 18357

I think I've managed to bring down the regex length to 16 which you say is the minimum length required.

Exploiting the fact where question says,

Note: Only ASCII

Positive or negative look ahead regex exceeded the length no matter how hard you try to reduce the regex length. Including all allowed upper case consonants results in 26 length which is too much.

Hence the only way seems to be using the negated character class, where some how we cleverly exclude all unneeded the characters. Here is the regex, which rejects all ASCII characters except upper case consonants.

[^ -AEIOU[-ÿ]

^ marks it as a negated character class and space to A " -A" excludes all unneeded characters as seen in the ASCII table as they are not needed. Then we take out E I O U specifically and then, excluding remaining unneeded ASCII characters can be done using [-ÿ range, as [ character is present immediately after Z and ÿ being the last character in extended ASCII characters. And thus above regex is created which matches only uppercase consonant characters excluding rest all ASCII characters.

Total length of this regex /[^ -AEIOU[-ÿ]/g is 16 as you expected. Let me know if this works fine for you.

Demo

PHP code,

$s = 'GAsSDITR';
preg_match_all(@'/[^ -AEIOU[-ÿ]/', $s, $matches);
echo count($matches[0]);

Prints,

5

Online PHP Demo

Upvotes: 6

Andreas
Andreas

Reputation: 23958

This matches anything that is not a AEIOUa-z and then adding a \W\d means it will exclude all special characters and numbers too. That is 17 characters and as far as I can see it works on all strings.

preg_match_all("/[^AEIOUa-z\W\d]/", $str, $m);
var_dump($m);

Returns

array(1) {
  [0]=>
  array(3) {
    [0]=>
    string(1) "B"
    [1]=>
    string(1) "D"
    [2]=>
    string(1) "F"
  }
}

https://3v4l.org/a0ETV

Upvotes: 1

Related Questions