Reputation: 93
My Code
import requests
from bs4 import BeautifulSoup
url = "http://www.quikr.com/jobs/direct-hiring-for-fresher-b.tech-diploma-iti-for-maruti-suzuki-gurgaon-W0QQAdIdZ293462666"
encode = 'utf-8'
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:53.0) Gecko/20100101 Firefox/53.0",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate",
"Connection": "close",
"DNT": "1",
"Upgrade-Insecure-Requests": "1"
}
response = requests.get(url, headers=headers)
encodeData = response.text.encode(encode)
soup = BeautifulSoup(encodeData)
print soup.prettify()
I am trying to scrap a html page, this is very basic code. But still I am getting error when I use prettify()
error is
UnicodeEncodeError: 'charmap' codec can't encode character u'\xa9' in position 7
Upvotes: 0
Views: 106
Reputation: 2767
This is a common problem. The issue probably isn't with your code, but with whatever console you're printing to. Beautifulsoup uses a unicode encoding, which a lot of editors don't always play nice with (for example, i get this error a lot when I print a soup in Sublime Text). Encoding the string to another format (UTF-8, ascii) should do the trick.
print soup.prettify().encode('utf-8')
I haven't tested, that may just fix it for you.
Upvotes: 2