carl
carl

Reputation: 77

Converting Unicode to ASCII equivalent (SCRAPY)

I am using Scrapy to crawl articles from News Website and add it to mongoDB. But while inserting i got unicode characters in MongoDb like this

"article": "Satya Nadella, Microsoft\u2019s executive vice president of cloud and enterprise, has just been named the company\u2019s next CEO.

I have tried

FEED_EXPORT_ENCODING = "utf-8"

But it only worked when i run crawler and export data as JSON File not when storing Data in MongoDB

In spider.py file i wrote this line of code to get article

item["article"]=response.xpath('//p/text()').getall()

item["article"] =' '.join(item['article'])

How to replace these characters with their ASCII equivalent ?

Upvotes: 1

Views: 230

Answers (1)

carl
carl

Reputation: 77

This solution worked for me (Character encoding in python to replace 'u2019' with ')

import unidecode 

a=unidecode.unidecode( "Satya Nadella, Microsoft\u2019s executive vice president of cloud and enterprise, has just been named the company\u2019s next CEO.")

Upvotes: 1

Related Questions