Horatiu Paraschiv
Horatiu Paraschiv

Reputation: 1780

How do I build a regex to match a pattern while excluding certain known words that would match the pattern

How do I build a regex to match a pattern while excluding certain known words that would match the pattern. In example I have this string:

I like to d.r.e.a.m at going to do h i k i n g.

and I have the following regex: \b(.{1,2}(\s|.|-|_)){2,}

This matches:

to d.r.e.a.m at

to do h i k i n g.

What I want is to change this regex in a way to match:

d.r.e.a.m

h i k i n g.

If I change it to this \b([^(to)]{1,2}(\s|.|-|_)){2,}

it will partially work but it would exclude individual letters like 't' 'o' instead of the entire word 'to'

How to solve this?

Upvotes: 2

Views: 62

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627082

You may use

/\b(?!(?:I|at|[td]o)\b)\w{1,2}(?:[\W_](?!(?:I|at|[td]o)\b)\w{1,2})*\b/

See this Rubular demo

It matches

  • \b - a word boundary
  • (?!(?:I|at|[td]o)\b)\w{1,2} - followed with a 1 or 2 word char word not equal to I, at, to or do
  • (?:[\W_](?!(?:I|at|[td]o)\b)\w{1,2})* - 0+ repetitions of:
    • [\W_] - a non-word char or _
    • (?!(?:I|at|[td]o)\b)\w{1,2} - see above
  • \b - a word boundary.

Upvotes: 2

Related Questions