Reputation: 304
I'm trying to break down sentences into words. Normally I'd use textstring.split(' ')
, but I'm also looking to split comma's and periods, separately, so "No, thank you" would be split into ["No", ",", "thank", "you"]
rather than ["No,", "thank", "you"]
.
I thought of doing it this way:
textstring.replace(",", " ,").replace(".", " .").split(' ')
But that feels a bit hacky. Is there any better way to do this?
Upvotes: 4
Views: 1581
Reputation: 239453
We can split them apart with a Regular Expression like this
textstring = "No, thank you"
import re
print re.findall(r'\w+|\S+', textstring)
# ['No', ',', 'thank', 'you']
\w+
will get all the consecutive alpha-numeric characters and _
, \S
will get all the consecutive non-space characters. The |
means match either the \w+
or \S+
part.
Upvotes: 5