Removing strings containing ASCII

Question

I have a string with a bunch of non-ASCII characters and I would like to remove it. I used the following function in Python 3:

def removeNonAscii(s): 
    return "".join(filter(lambda x: ord(x)<128, s))

str1 = "Hi there!\xc2\xa0My\xc2\xa0name\xc2\xa0is\xc2\xa0Blue "
new = removeNonAscii(str1)

The new string becomes:

Hi there!MynameisBlue

Is it possible to get spaces between the string such that it is:

Hi there! My name is Blue

nhahtdh · Accepted Answer

The code below is equivalent to your current code, except that for a contiguous sequence of characters outside the range of US-ASCII, it will replace the whole sequence with a single space (ASCII 32).

import re
re.sub(r'[^\x00-\x7f]+', " ", inputString)

Do note that control characters are allowed by the code above, and also the code in the question.

Removing strings containing ASCII

Answers (2)

Related Questions