s2 8v
s2 8v

Reputation: 127

Regular expression: match a word of certain length which starts with certain letters

I need a regex which matches a 7 letter word, which starts with 'st'. For example, it should only match 'startin' out of the following: start startin starting

Upvotes: 3

Views: 2824

Answers (3)

AbsoluteƵERØ
AbsoluteƵERØ

Reputation: 7870

To match words with non-accent characters that are case insensitive you'll need the i modifier or you'll need to declare both letters at the beginning in both cases.

<?php

    $regex = '!\bst[a-z]{5}\b!i';
    $words = "start startin starting station Stalker SHOWER Staples Stiffle Steerin StÄbles'";
    preg_match_all($regex,$words,$matches);
    print_r($matches[0]);
?>

Output

Array
(
    [0] => startin
    [1] => station
    [2] => Stalker
    [3] => Staples
    [4] => Stiffle
    [5] => Steerin
)

With the same output as above, if you didn't use the i modifier you would have to declare more characters:

$regex = '!\b[Ss][Tt][A-Za-z]{5}\b!';

If you want to match Unicode Characters you can do this:

print "<meta charset=\"utf-8\"><body>";

    $regex = '!\bst([a-z]|[^u0000-u0080]){5}\b!iu';

    $words = "start startin starting station Stalker SHOWER Staples Stiffle Steerin StÄbles'";

    preg_match_all($regex,$words,$matches);

    print_r($matches[0]);

print "</body>";    

Output

    Array
(
    [0] => startin
    [1] => station
    [2] => Stalker
    [3] => Staples
    [4] => Stiffle
    [5] => Steerin
    [6] => StÄbles //without UTF-8 output it looks like this-> StÃ"bles
)

Upvotes: 1

Alex Shesterov
Alex Shesterov

Reputation: 27525

General tips:

  • The starting symbols are included into the regex directly, e.g. st. If the starting characters are special in the sense of regex-syntax (like dots, parentheses, etc.), you need to escape them with a backslash, but it is not needed in your case.

  • After the starting symbols, include character class for the remaining characters of your "word". If you want to allow all characters, use a dot: .. If you want to allow all non-whitespace characters, use \S. If you want to allow only (unicode) letters, use \p{L}. To only allow non-accented latin letters, use [A-Za-z]. There are many possibilities here.

  • Finally, include repetition quantifier for the character class from the previous step. In you case, you need exactly 5 characters after st, so the repetition quantifier is {5}.

  • If you want only the whole string to match, use \A at the beginning and \z at the end of your regex. Or include \b at the beginning/end of your regex to match at the so-called word boundaries (including start/end of the string, whitespace, punctuation). The most powerful alternative (with full control) is the so-called lookahead - I'll leave it out here for the sake of simplicity.

See this tutorial for details. You can just look for specific keywords I've mentioned, e.g. repetition, character class, unicode, lookahead, etc.

Upvotes: 5

David
David

Reputation: 6571

preg_match_all('/\bst\w{5}\b/', 'start startin starting', $arr, PREG_PATTERN_ORDER);

UPDATE: used word boundaries before and after, based on comment

Upvotes: 0

Related Questions