Reputation: 127
I need a regex which matches a 7
letter word, which starts with 'st'
.
For example, it should only match 'startin'
out of the following: start startin starting
Upvotes: 3
Views: 2824
Reputation: 7870
To match words with non-accent characters that are case insensitive you'll need the i
modifier or you'll need to declare both letters at the beginning in both cases.
<?php
$regex = '!\bst[a-z]{5}\b!i';
$words = "start startin starting station Stalker SHOWER Staples Stiffle Steerin StÄbles'";
preg_match_all($regex,$words,$matches);
print_r($matches[0]);
?>
Output
Array
(
[0] => startin
[1] => station
[2] => Stalker
[3] => Staples
[4] => Stiffle
[5] => Steerin
)
With the same output as above, if you didn't use the i
modifier you would have to declare more characters:
$regex = '!\b[Ss][Tt][A-Za-z]{5}\b!';
If you want to match Unicode Characters you can do this:
print "<meta charset=\"utf-8\"><body>";
$regex = '!\bst([a-z]|[^u0000-u0080]){5}\b!iu';
$words = "start startin starting station Stalker SHOWER Staples Stiffle Steerin StÄbles'";
preg_match_all($regex,$words,$matches);
print_r($matches[0]);
print "</body>";
Output
Array
(
[0] => startin
[1] => station
[2] => Stalker
[3] => Staples
[4] => Stiffle
[5] => Steerin
[6] => StÄbles //without UTF-8 output it looks like this-> StÃ"bles
)
Upvotes: 1
Reputation: 27525
General tips:
The starting symbols are included into the regex directly, e.g. st
.
If the starting characters are special in the sense of regex-syntax (like dots, parentheses, etc.), you need to escape them with a backslash, but it is not needed in your case.
After the starting symbols, include character class for the remaining characters of your "word". If you want to allow all characters, use a dot: .
. If you want to allow all non-whitespace characters, use \S
. If you want to allow only (unicode) letters, use \p{L}
. To only allow non-accented latin letters, use [A-Za-z]
. There are many possibilities here.
Finally, include repetition quantifier for the character class from the previous step. In you case, you need exactly 5 characters after st
, so the repetition quantifier is {5}
.
If you want only the whole string to match, use \A
at the beginning and \z
at the end of your regex. Or include \b
at the beginning/end of your regex to match at the so-called word boundaries (including start/end of the string, whitespace, punctuation).
The most powerful alternative (with full control) is the so-called lookahead - I'll leave it out here for the sake of simplicity.
See this tutorial for details. You can just look for specific keywords I've mentioned, e.g. repetition, character class, unicode, lookahead, etc.
Upvotes: 5
Reputation: 6571
preg_match_all('/\bst\w{5}\b/', 'start startin starting', $arr, PREG_PATTERN_ORDER);
UPDATE: used word boundaries before and after, based on comment
Upvotes: 0