Guixing Bai
Guixing Bai

Reputation: 350

How to just keep punctuation with a string in python?

I want create a catalog of all logs, so I just want keep all punctuation and remove all other chars which include CJK and others.

for example:

s = "aaa; sf = fa = bla http://wa"

expected output is

;==://

Upvotes: 3

Views: 2670

Answers (2)

Ashwini Chaudhary
Ashwini Chaudhary

Reputation: 251196

You can use str.translate:

>>> from string import letters, digits, whitespace, punctuation
>>> s = "aaa; sf = fa = bla http://wa"
>>> s.translate(None, letters+digits+whitespace)
';==://'

or regex:

>>> re.sub(r'[^{}]+'.format(punctuation),'',s)
';==://'

Timing comparisons:

>>> s = "aaa; sf = fa = bla http://wa"*1000
>>> %timeit s.translate(None,letters+digits+whitespace)
10000 loops, best of 3: 171 us per loop                  #winner
>>> r1 = re.compile(r'[^{}]+'.format(punctuation))
>>> r2 = re.compile(r'[\w\s]+')
>>> %timeit r1.sub('',s)
100 loops, best of 3: 2.64 ms per loop
>>> %timeit r2.sub('',s)
100 loops, best of 3: 3.31 ms per loop

Upvotes: 16

Paulo Scardine
Paulo Scardine

Reputation: 77399

Using a regular expression:

>>> re.sub(r'[\w\s]+', '', "aaa; sf = fa = bla http://wa")
';==://'

Compiling can buy some performance, even for such a simple pattern...

>>> %timeit re.sub(r'[\w\s]+', '', "aaa; sf = fa = bla http://wa")
100000 loops, best of 3: 6.78 us per loop

>>> e = re.compile(r'[\w\s]+')
>>> %timeit e.sub('', "aaa; sf = fa = bla http://wa")
100000 loops, best of 3: 4.91 us per loop

...but regular expression is no match for Ashwinis' solution using str.translate:

>>> %timeit "aaa; sf = fa = bla http://wa".translate(None,letters+digits+whitespace)
1000000 loops, best of 3: 1.31 us per loop

Upvotes: 3

Related Questions