Reputation: 87
I wanted to split a sentence on multiple delimiters:
.?!\n
However, I want to keep the comma along with the word. For example for the string
'Hi, How are you?'
I want the result
['Hi,', 'How', 'are', 'you', '?']
I tried the following, but not getting the required result
words = re.findall(r"\w+|\W+", text)
Upvotes: 2
Views: 82
Reputation: 4504
If using re.findall:
>>> ss = """
... Hi, How are
...
... yo.u
... do!ing?
... """
>>> [ w for w in re.findall('(\w+\,?|[.?!]?)?\s*', ss) if w ]
['Hi,', 'How', 'are', 'yo', '.', 'u', 'do', '!', 'ing', '?']
Upvotes: 2
Reputation: 78700
re.split
and keep your delimiters, then filter out the strings which only contain whitespace.
>>> import re
>>> s = 'Hi, How are you?'
>>> [x for x in re.split('(\s|!|\.|\?|\n)', s) if x.strip()]
['Hi,', 'How', 'are', 'you', '?']
Upvotes: 4
Reputation: 2253
You can use:
re.findall('(.*?)([\s\.\?!\n])', text)
With a bit of itertools magic and list comprehensions:
[i.strip() for i in itertools.chain.from_iterable(re.findall('(.*?)([\s\.\?!\n])', text)) if i.strip()]
And a bit more comprehensible version:
words = []
found = itertools.chain.from_iterable(re.findall('(.*?)([\s\.\?!\n])', text)
for i in found:
w = i.strip()
if w:
words.append(w)
Upvotes: 0