Reputation: 25058

correctly strip : char with Regex

I want to get words in a text string in python

s = "The saddest aspect of life right now is: science gathers knowledge faster than society gathers wisdom."

result = re.sub("\b[^\w\d_]+\b", " ",  s ).split()
print result

I am getting:

['The', 'saddest', 'aspect', 'of', 'life', 'right', 'now', 'is:', 'science', 'gathers', 'knowledge', 'faster', 'than', 'society', 'gathers', 'wisdom.']

How can I get "is" and not "is:" on strings that happen to contain : ? I thought using \b would be enough...

Upvotes: 3

Answers (3)

gffbss

Reputation: 1701

As the other answers pointed out you need to define a raw string literal using r like so: (r"...")

If you want to strip the periods, I believe you can simplify your regex to just:

result = re.sub(r"[^\w' ]", " ", s ).split()

As you likely know the \w metacharacter strips the string of anything that is not a-z, A-Z, 0-9

So if you can anticipate that your sentences will not have numbers that should do the trick.

Upvotes: 1

Alexander O'Mara

Reputation: 60577

I think you intended to pass a raw string to re.sub (notice the r).

result = re.sub(r"\b[^\w\d_]+\b", " ",  s ).split()

Returns:

['The', 'saddest', 'aspect', 'of', 'life', 'right', 'now', 'is', 'science', 'gathers', 'knowledge', 'faster', 'than', 'society', 'gathers', 'wisdom.']

Upvotes: 1

jamylak

Reputation: 133634

You forgot to make it a raw string literal (r"..")

>>> import re
>>> s = "The saddest aspect of life right now is: science gathers knowledge faster than society gathers wisdom."
>>> re.sub("\b[^\w\d_]+\b", " ",  s ).split()
['The', 'saddest', 'aspect', 'of', 'life', 'right', 'now', 'is:', 'science', 'gathers', 'knowledge', 'faster', 'than', 'society', 'gathers', 'wisdom.']
>>> re.sub(r"\b[^\w\d_]+\b", " ",  s ).split()
['The', 'saddest', 'aspect', 'of', 'life', 'right', 'now', 'is', 'science', 'gathers', 'knowledge', 'faster', 'than', 'society', 'gathers', 'wisdom.']

Upvotes: 1

correctly strip : char with Regex

Answers (3)

Related Questions