Reputation: 7380
Consider a collection of items. Each item may have no, one or many tags assigned. A tag name may consist of any valid unicode character except for whitespaces (space, newline, ...). The tag
property of each item is a space-separated list of tags, e.g. tag1 tag2 tag3
.
I am currently working on a PHP function that filters all items for those who does contain a certain set of tags, and on the other hand is not allowed to contain some others.
Currently, I generate a regular expression like
/^(?=.*\bfoo\b)(?=.*\bbar\b)(?!.*\bbaz\b).*$/
out of the search query. This expression matches all tag
properties which contains both, foo
and bar
but not baz
. This works perfect while the tags start and end with a word character, but stop working otherwise (e.g. for tags starting or ending with a dot or hash sign) as the word boundary anchor only works for word characters.
Do you have any idea how I can modify the regular expression for tags like .foo#
?
The solution should be supported on PHP 5.5+.
Upvotes: 2
Views: 82
Reputation: 887
A working example:
^(?=.*(?<!\S)foo@(?!\S).*)(?!.*(?<!\S)_bar#(?!\S).*).*
Instead of a word boundary I've asserted no white-space before and after the tag. A word boundary is a combination of two lookarounds; in this case you only want one of the two (twice).
Upvotes: 0
Reputation: 16968
I think you are generating your pattern, If yes you can use a pattern like this:
/^(?=.*(\W|^)foo(\W|$))(?=.*(\W|^)bar(\W|$))(?!.*(\W|^)baz(\W|$)).*$/
If not you can simply get those characters outside the \bfoo\b
like \.\bfoo\b#
.
Upvotes: 2