Reputation: 17849
I'm trying to write a regex that will match only the first and third words in the string:
term1 and term2
My first attempt was [^(\s|(and))]+
, but it fails because
term1 anbd term2
gives me these 3 matches: ['term1','b','term2']
whereas I want it to return ['term1','anbd','term2']
Upvotes: 0
Views: 1022
Reputation: 56162
You can use this regex \b\w+\b
to split your sentence on words, then take 1st and 3rd.
import re
pat = re.compile(r'\b\w+\b') # pre-compile the pattern
# for this example the pre-compiling doesn't really matter.
temp = re.findall(pat, "Hello, beautiful world!")
lst = [temp[0], temp[2]] # sets lst to ["Hello", "world"]
Upvotes: 1
Reputation: 61498
[]
surround a character class - a set of characters to match, or not match. Your regex says "one or more characters, none of which are ,
a
, n
or d
", which is why you get the result you do.
Getting correct answers to these sorts of things requires correct questions. What's special about the word "and" in your case? Do you want "every word that is not and
", or do you want "the first and third word of the string, no matter what the words are", or just what?
Your description of the desired output in the second case sounds like you want "every word that is not and
". There are much simpler ways to get this. Regexes are not really as useful as people want them to be.
The split
method of strings cuts it into words. From there, we can use a list comprehension to filter out any words that are "and". It looks like:
[word for word in sentence.split() if word != "and"]
See? It's practically plain English.
Upvotes: 0
Reputation: 304137
Instead of regex, consider
sentence.split()[:3:2]
eg
>>> "term1 and term2".split()[:3:2]
['term1', 'term2']
>>> "term1 anbd term2".split()[:3:2]
['term1', 'term2']
>>>
Upvotes: 3
Reputation: 20654
Match only the first and third words: (\S+)\s+\S+\s+(\S+)
EDIT: If you mean 'match all the words except the word "and"' then: \b(?!and\b)\S+\b
Upvotes: 5
Reputation: 141839
I just tested this, it works :)
\b([^a].*?\b|a[^n].*?\b|an[^d].*?\b)
Upvotes: 0