Reputation: 77
I am using Scrapy to crawl articles from News Website and add it to mongoDB. But while inserting i got unicode characters in MongoDb like this
"article": "Satya Nadella, Microsoft\u2019s executive vice president of cloud and enterprise, has just been named the company\u2019s next CEO.
I have tried
FEED_EXPORT_ENCODING = "utf-8"
But it only worked when i run crawler and export data as JSON File not when storing Data in MongoDB
In spider.py file i wrote this line of code to get article
item["article"]=response.xpath('//p/text()').getall()
item["article"] =' '.join(item['article'])
How to replace these characters with their ASCII equivalent ?
Upvotes: 1
Views: 230
Reputation: 77
This solution worked for me (Character encoding in python to replace 'u2019' with ')
import unidecode
a=unidecode.unidecode( "Satya Nadella, Microsoft\u2019s executive vice president of cloud and enterprise, has just been named the company\u2019s next CEO.")
Upvotes: 1