Reputation: 57196
I was trying to write an regex that allows single hyphens and single spaces only within words but not at the beginning or at the end of the words.
I thought I have this sorted from the answer I got yesterday, but I just realised there is small error which I don't quite understand,
Why it won't accept the inputs like,
'forum-category-b forum-category-a'
'forum-category-b Counter-terrorism'
'forum-category-a Preventing'
'forum-category-a Preventing Violent'
'forum-category-a International-Research-and-Publications'
'International-Research-and-Publications forum-category-b forum-category-a'
but it takes,
'forum-category-b'
'Counter-terrorism forum-category-a'
'Preventing forum-category-a'
'Preventing Violent forum-category-a'
'International-Research-and-Publications forum-category-b'
Why is that? How can I fix it? It Below is the regex with the initial test, but ideally it should accept all the combination inputs above,
$aWords = array(
'a',
'---stack---over---flow---',
' stack over flow',
'stack-over-flow',
'stack over flow',
'stacoverflow'
);
foreach($aWords as $sWord) {
if (preg_match('/^(\w+([\s-]\w+)?)+$/', $sWord)) {
echo 'pass: ' . $sWord . "\n";
} else {
echo 'fail: ' . $sWord . "\n";
}
}
accept/ to reject the input like these below,
---stack---over---flow---
stack-over-flow- stack-over-flow2
stack over flow
Thanks.
Upvotes: 1
Views: 4588
Reputation: 67157
Your pattern does not do what you want. Let's break it apart:
^(\w+([\s-]\w+)?)+$
It matches strings that consist solely of one or more sequences of the pattern:
\w+([\s-]\w+)?
...which is a sequence of word characters, followed optionally by one other sequence of word characters, separated by one space or dash character.
In other words, your pattern searches for strings like:
xxx-xxxyyy-yyyzzz zzz
...but you intent to write a pattern that would find:
xxx-xxxxxx-xxxxxx yyy
In your examples, this one is matched:
Counter-terrorism forum-category-a
...but it is interpreted as the following sequence:
(Counter(-terroris)) (m( foru)) (m(-categor) (y(-a))
As you can see, the pattern did not really find the words you are looking for.
This example is not matched:
forum-category-a Preventing Violent
...since the pattern cannot form groups of "word characters, space-or-dash, word-characters" when it encounters a single word character followed by space or dash:
(forum(-categor)) (y(-a)) <Mismatch: Found " " but expected "\w">
If you would add another character to "forum-category-a", say "forum-category-ax", it would match again, since it could split at the "ax":
(forum(-categor)) (y(-a)) (x( Preventin)) (g( Violent))
What you are actually interested in is a pattern like
^(\w+(-\w+)*)(\s\w+(-\w+)*)*$
...which would find a sequence of words that may contain dashes, separated by spaces:
(forum(-category)(-a)) ( Preventing) ( Violent)
By the way, I tested this using a Python script, and while trying to match your pattern against the example string "International-Research-and-Publications forum-category-b forum-category-a", the regular expression engine seemed to run into an infinite loop...
import re
expr = re.compile(r'^(\w+([\s-]\w+)?)+$')
expr.match('International-Research-and-Publications forum-category-b forum-category-a')
Upvotes: 2
Reputation:
There should be only one answer to this problem:
/^((?<=\w)[ -]\w|[^ -])+$/
There is only 1 rule as stated \w[ -]\w
and thats it. And its on a per character basis granularity, and cannot be anthing else. Add the [^ -] for the rest.
Upvotes: 0
Reputation: 101614
the part of your pattern ([\s-]\w+)?
is the issue. It's only allowing for one repetition (the trailing ?
). Try changing the last ?
to *
and see if that helps.
Nope, I still believe that's the problem. The original pattern is looking for "word" or "word[space_hyphen]word" repeated 1+ times. Which is weird because the pattern should fall within another match. But switching the question mark worked for me.
Upvotes: 0