Reputation: 6756
Let assume that I have some string: "Lorem ipsum dolor sit amet" I need a list of all words with lenght more than 3. Can I do it with regular expressions?
e.g.
pattern = re.compile(r'some pattern')
result = pattern.search('Lorem ipsum dolor sit amet').groups()
result contains 'Lorem', 'ipsum', 'dolor' and 'amet'.
EDITED:
The words I mean can only contains letters and numbers.
Upvotes: 9
Views: 18704
Reputation: 336078
>>> import re
>>> myre = re.compile(r"\w{4,}")
>>> myre.findall('Lorem, ipsum! dolor sit? amet...')
['Lorem', 'ipsum', 'dolor', 'amet']
Take note that in Python 3, where all strings are Unicode, this will also find words that use non-ASCII letters:
>>> import re
>>> myre = re.compile(r"\w{4,}")
>>> myre.findall('Lorem, ipsum! dolör sit? amet...')
['Lorem', 'ipsum', 'dolör', 'amet']
In Python 2, you'd have to use
>>> myre = re.compile(r"\w{4,}", re.UNICODE)
>>> myre.findall(u'Lorem, ipsum! dolör sit? amet...')
[u'Lorem', u'ipsum', u'dol\xf6r', u'amet']
Upvotes: 19
Reputation: 2334
pattern = re.compile(r'(\S{4,})')
pattern.findall('Lorem ipsum dolor sit amet')
['Lorem', 'ipsum', 'dolor', 'amet']
Upvotes: 0
Reputation: 3029
pattern = re.compile("\w\w\w(\w+)")
result = pattern.search('Lorem ipsum dolor sit amet').groups()
Upvotes: 2
Reputation: 110146
That is a tipical use case for list comprehensions in Python, which can be used for filtering:
text = 'Lorem ipsum dolor sit amet'
result = [word for word in pattern.findall(text) if len(word) > 3]
Upvotes: 2