xaander1
xaander1

Reputation: 1160

Remove trailing white spaces,unicode characters and a special character

How to clean a string from white spaces and a special character in python.

I am scraping some data however the text am getting is a little gibberish. I figure i could clean using join strip and enconding however my output is unexpected.

#cleaner function

def string_cleaner(rouge_text):
             return (" ".join(rouge_text.strip()).encode('ascii', 'ignore').decode("utf-8")).replace("\\","")

print(string_cleaner("\n\t\t\t\t\t\t\t\t\t Nokia 9 PureView- 5.99\ "))
print(string_cleaner("\n\t\t\t\t\t\t\t\t\tMi Electronic Scooter\uff08Black\uff09EU\t \t\t\t\t\t\t\t\t "))

OUTPUT

screenshot

How do i clean my string and get normal text?

Upvotes: 0

Views: 737

Answers (1)

MaBekitsur
MaBekitsur

Reputation: 181

I'm not sure I get what you mean by "clean my string and get normal text", but maybe try to use is this way:

def string_cleaner(rouge_text):
    # "" instead of " " in .join() method
    return ("".join(rouge_text.strip()).encode('ascii', 'ignore').decode("utf-8")).replace("\\","")

print(string_cleaner("\n\t\t\t\t\t\t\t\t\t Nokia 9 PureView- 5.99\ "))
print(string_cleaner("\n\t\t\t\t\t\t\t\t\tMi Electronic Scooter\uff08Black\uff09EU\t \t\t\t\t\t\t\t\t "))

OUTPUT:

>>> print(string_cleaner("\n\t\t\t\t\t\t\t\t\t Nokia 9 PureView- 5.99\ "))
Nokia 9 PureView- 5.99
>>> print(string_cleaner("\n\t\t\t\t\t\t\t\t\tMi Electronic Scooter\uff08Black\uff09EU\t \t\t\t\t\t\t\t\t "))
Mi Electronic ScooterBlackEU

Upvotes: 2

Related Questions