user3535492
user3535492

Reputation: 87

Python: Regex Search

I wanted to split a sentence on multiple delimiters:

.?!\n

However, I want to keep the comma along with the word. For example for the string

'Hi, How are you?'

I want the result

['Hi,', 'How', 'are', 'you', '?']

I tried the following, but not getting the required result

words = re.findall(r"\w+|\W+", text)

Upvotes: 2

Views: 82

Answers (3)

Quinn
Quinn

Reputation: 4504

If using re.findall:

>>> ss = """
... Hi, How are
...
... yo.u
... do!ing?
... """
>>> [ w for w in re.findall('(\w+\,?|[.?!]?)?\s*', ss) if w ]
['Hi,', 'How', 'are', 'yo', '.', 'u', 'do', '!', 'ing', '?']

Upvotes: 2

timgeb
timgeb

Reputation: 78700

re.split and keep your delimiters, then filter out the strings which only contain whitespace.

>>> import re
>>> s = 'Hi, How are you?'
>>> [x for x in re.split('(\s|!|\.|\?|\n)', s) if x.strip()]
['Hi,', 'How', 'are', 'you', '?']

Upvotes: 4

hruske
hruske

Reputation: 2253

You can use:

re.findall('(.*?)([\s\.\?!\n])', text)

With a bit of itertools magic and list comprehensions:

[i.strip() for i in itertools.chain.from_iterable(re.findall('(.*?)([\s\.\?!\n])', text)) if i.strip()]

And a bit more comprehensible version:

words = []
found = itertools.chain.from_iterable(re.findall('(.*?)([\s\.\?!\n])', text)
for i in found:
    w = i.strip()
    if w:
        words.append(w)

Upvotes: 0

Related Questions