Luke Baulch
Luke Baulch

Reputation: 3656

Regex - Find all matching words that don't begin with a specific prefix

How would I construct a regular expression to find all words that end in a string but don't begin with a string?

e.g. Find all words that end in 'friend' that don't start with the word 'girl' in the following sentence:

"A boyfriend and girlfriend gained a friend when they asked to befriend them"

The items in bold should match. The word 'girlfriend' should not.

Upvotes: 25

Views: 42125

Answers (5)

Basheer AL-MOMANI
Basheer AL-MOMANI

Reputation: 15327

In my case I needed to exclude some words that have a given prefix from regex matching result

the text was query-string params

?=&sysNew=false&sysStart=true&sysOffset=4&Question=1

the prefix is sys and I dont the words that have sys in them

the key to solve the issue was with word boundary \b

\b(?!sys)\w+\b

then I added that part in the bigger regex for query-string

(\b(?!sys)\w+\b)=(\w+)

Upvotes: 0

Rob Raisch
Rob Raisch

Reputation: 17357

Off the top of my head, you could try:

\b             # word boundary - matches start of word
(?!girl)       # negative lookahead for literal 'girl'
\w*            # zero or more letters, numbers, or underscores
friend         # literal 'friend'
\b             # word boundary - matches end of word

Update

Here's another non-obvious approach which should work in any modern implementation of regular expressions:

Assuming you wish to extract a pattern which appears within multiple contexts but you only want to match if it appears in a specific context, you can use an alteration where you first specify what you don't want and then capture what you do.

So, using your example, to extract all of the words that either are or end in friend except girlfriend, you'd use:

\b               # word boundary
(?:              # start of non-capture group 
  girlfriend     # literal (note 1)
|                # alternation
  (              # start of capture group #1 (note 2)
    \w*          # zero or more word chars [a-zA-Z_]
    friend       # literal 
  )              # end of capture group #1
)                # end of non-capture group
\b

Notes:

  1. This is what we do not wish to capture.
  2. And this is what we do wish to capture.

Which can be described as:

  • for all words
  • first, match 'girlfriend' and do not capture (discard)
  • then match any word that is or ends in 'friend' and capture it

In Javascript:

const target = 'A boyfriend and girlfriend gained a friend when they asked to befriend them';

const pattern = /\b(?:girlfriend|(\w*friend))\b/g;

let result = [];
let arr;

while((arr=pattern.exec(target)) !== null){
  if(arr[1]) {
    result.push(arr[1]);
  }
}

console.log(result);

which, when run, will print:

[ 'boyfriend', 'friend', 'befriend' ]

Upvotes: 29

nl-x
nl-x

Reputation: 11832

I changed Rob Raisch's answer to a regexp that finds words Containing a specific substring, but not also containing a different specific substring

\b(?![\w_]*Unwanted[\w_]*)[\w_]*Desired[\w_]*\b

So for example \b(?![\w_]*mon[\w_]*)[\w_]*day[\w_]*\b will find every word with "day" (eg day , tuesday , daywalker ) in it, except if it also contains "mon" (eg monday)

Maybe useful for someone.

Upvotes: 4

This may work:

\w*(?<!girl)friend

you could also try

\w*(?<!girl)friend\w* if you wanted to match words like befriended or boyfriends.

I'm not sure if ?<! is available in all regex versions, but this expression worked in Expersso (which I believe is .NET).

Upvotes: 11

morja
morja

Reputation: 8560

Try this:

/\b(?!girl)\w*friend\b/ig

Upvotes: 7

Related Questions