ctrlaltdel
ctrlaltdel

Reputation: 145

How to remove punctuation in python?

I've a problem:

E.x. I have a sentence

s = "AAA? BBB. CCC!" 

So, I do:

import string
table = str.maketrans('', '', string.punctuation)
s = [w.translate(table) for w in s]

And it's all right. My new sentence will be:

s = "AAA BBB CCC"

But, if I have input sentence like:

s = "AAA? BBB. CCC! DDD.EEE"

after remove punctuation the same method as below I'll have

s = "AAA BBB CCC DDDEEE"

but need:

s = "AAA BBB CCC DDD EEE"

Is any ideas/methods how to solve this problem?

Upvotes: 9

Views: 37262

Answers (8)

horace
horace

Reputation: 948

I know not everyone has this situation, but I am writing an internationalized app and it's a bit heavier lift. This is what I have come up with:

[Edited to add 'import regex'] - Thanks Andj

import regex

random_string = "~`!ќ®†њѓѕў‘“ъйжюёф №%:,)( ЛПМКё…∆≤≥“™ƒђ≈≠»"

clean_string = regex.sub( r'[^\w\s]', '', random_string )

print( clean_string )

Result is:

ќњѓѕўъйжюёф  ЛПМКёƒђ

This works with a wide range of alphabets and special characters in many languages. I've tested it on several languages with every special character and a few regular characters on that keyboard. Still need to strip out a few special markers this won't detect.

Straightforward but powerful. Hope that helps someone.

Upvotes: 0

Vlad Bezden
Vlad Bezden

Reputation: 89527

string.punctuation contains following characters:

'!"#$%&\'()*+,-./:;<=>?@[\]^_`{|}~'

You can use translate and maketrans functions to map punctuations to empty values (replace)

import string

'AAA? BBB. CCC! DDD.EEE'.translate(str.maketrans('', '', string.punctuation))

Output:

'AAA BBB CCC DDDEEE'

Upvotes: 8

Niranjana Raguraman
Niranjana Raguraman

Reputation: 1

Try this:

import string
exclude = set(string.punctuation)
exclude.remove(".")
doc = "AAA? BBB. CCC! DDD.EEE"
for punctuation in exclude:
    doc = doc.replace(punctuation,"")
doc = doc.replace("."," ")
doc = doc.split()
print(" ".join(doc))

Upvotes: 0

9769953
9769953

Reputation: 12202

Use:

import re

" ".join(re.split('\W+', s))

That splits the string on all non-word characters, then joins the individual substrings by single spaces.

Upvotes: 2

Rakesh
Rakesh

Reputation: 82765

This is one approach using str.strip and a simple iteration.

Ex:

from string import punctuation

s = "AAA? BBB. CCC! DDD.EEE"

def cleanString(strval):
    return "".join(" " if i in punctuation else i for i in strval.strip(punctuation))

s = " ".join(cleanString(i) for i in s.split())
print(s)

Output:

AAA BBB CCC DDD EEE

Upvotes: 1

alpharoz
alpharoz

Reputation: 159

You can also do it like this:

punctuation = "!@#$%^&*()_+<>?:.,;"  # add whatever you want

s = "AAA? BBB. CCC!" 
for c in s:
    if c in punctuation:
        s = s.replace(c, "")

print(s)

>>> "AAA BBB CCC"

Upvotes: 4

Optimus
Optimus

Reputation: 709

Check this out:

if __name__ == "__main__":
    test_string = "AAA? BBB. CCC! DDD.EEE"
    result = "".join((char if char.isalpha() else " ") for char in test_string)
    print(result)


Result: AAA  BBB  CCC  DDD EEE

Upvotes: 0

Bharat Jogdand
Bharat Jogdand

Reputation: 438

Try this code:

import re

input_str = "AAA? BBB. CCC! DDD.EEE"
output_str = re.sub('[^A-Za-z0-9]+', ' ', input_str)
print output_str

'AAA BBB CCC DDD EEE'

Upvotes: 5

Related Questions