muffel
muffel

Reputation: 7380

PHP regular expressions: match several inclusion / exclusion rules for matches consisting of a complex character set

Consider a collection of items. Each item may have no, one or many tags assigned. A tag name may consist of any valid unicode character except for whitespaces (space, newline, ...). The tag property of each item is a space-separated list of tags, e.g. tag1 tag2 tag3.

I am currently working on a PHP function that filters all items for those who does contain a certain set of tags, and on the other hand is not allowed to contain some others.

Currently, I generate a regular expression like

/^(?=.*\bfoo\b)(?=.*\bbar\b)(?!.*\bbaz\b).*$/

out of the search query. This expression matches all tag properties which contains both, foo and bar but not baz. This works perfect while the tags start and end with a word character, but stop working otherwise (e.g. for tags starting or ending with a dot or hash sign) as the word boundary anchor only works for word characters.

Do you have any idea how I can modify the regular expression for tags like .foo#?

The solution should be supported on PHP 5.5+.

Upvotes: 2

Views: 82

Answers (2)

linden2015
linden2015

Reputation: 887

A working example:

^(?=.*(?<!\S)foo@(?!\S).*)(?!.*(?<!\S)_bar#(?!\S).*).*

Instead of a word boundary I've asserted no white-space before and after the tag. A word boundary is a combination of two lookarounds; in this case you only want one of the two (twice).

  • Flags: g, m
  • Steps: 270

Demo

Upvotes: 0

shA.t
shA.t

Reputation: 16968

I think you are generating your pattern, If yes you can use a pattern like this:

/^(?=.*(\W|^)foo(\W|$))(?=.*(\W|^)bar(\W|$))(?!.*(\W|^)baz(\W|$)).*$/

[Regex Demo]

If not you can simply get those characters outside the \bfoo\b like \.\bfoo\b#.

Upvotes: 2

Related Questions