Timon Knigge
Timon Knigge

Reputation: 304

Splitting text in python but treating commas, periods etc as separate 'words'

I'm trying to break down sentences into words. Normally I'd use textstring.split(' '), but I'm also looking to split comma's and periods, separately, so "No, thank you" would be split into ["No", ",", "thank", "you"] rather than ["No,", "thank", "you"].

I thought of doing it this way:

textstring.replace(",", " ,").replace(".", " .").split(' ')

But that feels a bit hacky. Is there any better way to do this?

Upvotes: 4

Views: 1581

Answers (1)

thefourtheye
thefourtheye

Reputation: 239453

We can split them apart with a Regular Expression like this

textstring = "No, thank you"
import re
print re.findall(r'\w+|\S+', textstring)
# ['No', ',', 'thank', 'you']

\w+ will get all the consecutive alpha-numeric characters and _, \S will get all the consecutive non-space characters. The | means match either the \w+ or \S+ part.

Upvotes: 5

Related Questions