Trindaz
Trindaz

Reputation: 17849

Regex for word exclusion

I'm trying to write a regex that will match only the first and third words in the string:

term1 and term2

My first attempt was [^(\s|(and))]+, but it fails because

term1 anbd term2

gives me these 3 matches: ['term1','b','term2'] whereas I want it to return ['term1','anbd','term2']

Upvotes: 0

Views: 1022

Answers (5)

Kirill Polishchuk
Kirill Polishchuk

Reputation: 56162

You can use this regex \b\w+\b to split your sentence on words, then take 1st and 3rd.

import re
pat = re.compile(r'\b\w+\b')  # pre-compile the pattern
# for this example the pre-compiling doesn't really matter.
temp = re.findall(pat, "Hello, beautiful world!")
lst = [temp[0], temp[2]]  # sets lst to ["Hello", "world"]

Upvotes: 1

Karl Knechtel
Karl Knechtel

Reputation: 61498

[] surround a character class - a set of characters to match, or not match. Your regex says "one or more characters, none of which are , a, n or d", which is why you get the result you do.

Getting correct answers to these sorts of things requires correct questions. What's special about the word "and" in your case? Do you want "every word that is not and", or do you want "the first and third word of the string, no matter what the words are", or just what?

Your description of the desired output in the second case sounds like you want "every word that is not and". There are much simpler ways to get this. Regexes are not really as useful as people want them to be.

The split method of strings cuts it into words. From there, we can use a list comprehension to filter out any words that are "and". It looks like:

[word for word in sentence.split() if word != "and"]

See? It's practically plain English.

Upvotes: 0

John La Rooy
John La Rooy

Reputation: 304137

Instead of regex, consider

sentence.split()[:3:2]

eg

>>> "term1 and term2".split()[:3:2]
['term1', 'term2']
>>> "term1 anbd term2".split()[:3:2]
['term1', 'term2']
>>> 

Upvotes: 3

MRAB
MRAB

Reputation: 20654

Match only the first and third words: (\S+)\s+\S+\s+(\S+)

EDIT: If you mean 'match all the words except the word "and"' then: \b(?!and\b)\S+\b

Upvotes: 5

Paul
Paul

Reputation: 141839

I just tested this, it works :)

\b([^a].*?\b|a[^n].*?\b|an[^d].*?\b)

Upvotes: 0

Related Questions