Chetan Ambi
Chetan Ambi

Reputation: 169

Text processing in Python - how to handle invalid character strings

I am working on text classification. I am seeing invalid characters as shown below. Can someone help me how to decode these characters to actual value. Any pointer should also help.

"It wouldn\'t take much to do for **Ã\x86sop**,\n\n\n\n\n            would it?**â\x80\x9d** whispered Ivan to Alyosha.\n\n\n\n\n\n\n\n\n\n            **â\x80\x9c**God forbid!**â\x80\x9d** cried Alyosha.\n\n\n\n\n\n\n\n\n\n            **â\x80\x9c**Why should He forbid?**â\x80\x9d** Ivan went on in the\n\n\n\n\n            same whisper, with a malignant grimace. **â\x80\x9c**One reptile will devour the other., And serve them\n\n\n\n\n            both right, too.â\x80\x9d\n\n\n\n\n\n\n\n\n\n            Alyosha\n\n\n\n\n            shuddered.\n\n\n\n\n\n\n\n\n\n            â\x80\x9cOf course I won\'t let him be murdered as I didn\'t\n\n\n\n\n            just now., Stay here, Alyosha, I\'ll go for a turn in the yard., My\n\n\n\n\n            head\'s begun to ache.â\x80\x9d\n\n\n\n\n\n\n\n\n\n            Alyosha went\n\n\n\n\n            to his father\'s bedroom and sat by his bedside behind the screen\n\n\n\n\n            for about an hour., The old man suddenly opened his eyes and gazed\n\n\n\n\n            for a long while at Alyosha, evidently remembering and\n\n\n\n\n            meditating., All at once his face betrayed extraordinary\n\n\n\n\n            excitement.\n\n\n\n\n\n\n\n\n\n            â\x80\x9cAlyosha,â\x80\x9d he whispered apprehensively,\n\n\n\n\n            â\x80\x9cwhere\'s Ivan?â\x80\x9d\n\n\n\n\n\n\n\n\n\n            â\x80\x9cIn the yard., He\'s got a headache., He\'s on the\n\n\n\n\n            watch.â\x80\x9d\n\n\n\n\n\n\n\n\n\n            â\x80\x9cGive me that looking-glass., It stands over there.\n\n\n\n\n            Give it me.â\x80\x9d\n\n\n\n\n\n\n\n\n\n            Alyosha gave\n\n\n\n\n            him a little round folding looking-glass which stood on the chest\n\n\n\n\n            of drawers., The old man looked at himself in it; his nose was\n\n\n\n\n            considerably swollen, and on the left side of his forehead there\n\n\n\n\n            was a rather large crimson bruise.\n\n\n\n\n\n\n\n\n\n            â\x80\x9cWhat does Ivan say?

Upvotes: 1

Views: 703

Answers (1)

snakecharmerb
snakecharmerb

Reputation: 55770

It looks like the data has been doubly encoded (are you using Python2?). It can be fixed by encoding to latin-1 and then decoding from UTF-8.

>>> data.encode('latin-1').decode('utf-8')
"It wouldn't take much to do for **Æsop**,\n\n\n\n\n            would it?**”** whispered Ivan to Alyosha.\n\n\n\n\n\n\n\n\n\n            **“**God forbid!**”** cried Alyosha.\n\n\n\n\n\n\n\n\n\n            **“**Why should He forbid?**”** Ivan went on in the\n\n\n\n\n            same whisper, with a malignant grimace. **“**One reptile will devour the other., And serve them\n\n\n\n\n            both right, too.”\n\n\n\n\n\n\n\n\n\n            Alyosha\n\n\n\n\n            shuddered.\n\n\n\n\n\n\n\n\n\n            “Of course I won't let him be murdered as I didn't\n\n\n\n\n            just now., Stay here, Alyosha, I'll go for a turn in the yard., My\n\n\n\n\n            head's begun to ache.”\n\n\n\n\n\n\n\n\n\n            Alyosha went\n\n\n\n\n            to his father's bedroom and sat by his bedside behind the screen\n\n\n\n\n            for about an hour., The old man suddenly opened his eyes and gazed\n\n\n\n\n            for a long while at Alyosha, evidently remembering and\n\n\n\n\n            meditating., All at once his face betrayed extraordinary\n\n\n\n\n            excitement.\n\n\n\n\n\n\n\n\n\n            “Alyosha,” he whispered apprehensively,\n\n\n\n\n            “where's Ivan?”\n\n\n\n\n\n\n\n\n\n            “In the yard., He's got a headache., He's on the\n\n\n\n\n            watch.”\n\n\n\n\n\n\n\n\n\n            “Give me that looking-glass., It stands over there.\n\n\n\n\n            Give it me.”\n\n\n\n\n\n\n\n\n\n            Alyosha gave\n\n\n\n\n            him a little round folding looking-glass which stood on the chest\n\n\n\n\n            of drawers., The old man looked at himself in it; his nose was\n\n\n\n\n            considerably swollen, and on the left side of his forehead there\n\n\n\n\n            was a rather large crimson bruise.\n\n\n\n\n\n\n\n\n\n            “What does Ivan say?"

Upvotes: 2

Related Questions