szaman
szaman

Reputation: 6756

List of all words matching regular expression

Let assume that I have some string: "Lorem ipsum dolor sit amet" I need a list of all words with lenght more than 3. Can I do it with regular expressions?

e.g.

pattern = re.compile(r'some pattern')
result = pattern.search('Lorem ipsum dolor sit amet').groups()

result contains 'Lorem', 'ipsum', 'dolor' and 'amet'.

EDITED:

The words I mean can only contains letters and numbers.

Upvotes: 9

Views: 18704

Answers (4)

Tim Pietzcker
Tim Pietzcker

Reputation: 336078

>>> import re
>>> myre = re.compile(r"\w{4,}")
>>> myre.findall('Lorem, ipsum! dolor sit? amet...')
['Lorem', 'ipsum', 'dolor', 'amet']

Take note that in Python 3, where all strings are Unicode, this will also find words that use non-ASCII letters:

>>> import re
>>> myre = re.compile(r"\w{4,}")
>>> myre.findall('Lorem, ipsum! dolör sit? amet...')
['Lorem', 'ipsum', 'dolör', 'amet']

In Python 2, you'd have to use

>>> myre = re.compile(r"\w{4,}", re.UNICODE)
>>> myre.findall(u'Lorem, ipsum! dolör sit? amet...')
[u'Lorem', u'ipsum', u'dol\xf6r', u'amet']

Upvotes: 19

albertov
albertov

Reputation: 2334

pattern = re.compile(r'(\S{4,})')
pattern.findall('Lorem ipsum dolor sit amet')
['Lorem', 'ipsum', 'dolor', 'amet']

Upvotes: 0

krakover
krakover

Reputation: 3029

pattern = re.compile("\w\w\w(\w+)")
result = pattern.search('Lorem ipsum dolor sit amet').groups()

Upvotes: 2

jsbueno
jsbueno

Reputation: 110146

That is a tipical use case for list comprehensions in Python, which can be used for filtering:

text = 'Lorem ipsum dolor sit amet'
result = [word for word in  pattern.findall(text) if len(word) > 3]

Upvotes: 2

Related Questions