Firdoesh
Firdoesh

Reputation: 93

Python BeautifulSoup error

My Code

import requests
from bs4 import BeautifulSoup

url = "http://www.quikr.com/jobs/direct-hiring-for-fresher-b.tech-diploma-iti-for-maruti-suzuki-gurgaon-W0QQAdIdZ293462666"
encode = 'utf-8'
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:53.0) Gecko/20100101 Firefox/53.0",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.5",
    "Accept-Encoding": "gzip, deflate",
    "Connection": "close",
    "DNT": "1",
    "Upgrade-Insecure-Requests": "1"
}
response = requests.get(url, headers=headers)
encodeData = response.text.encode(encode)
soup = BeautifulSoup(encodeData)
print soup.prettify()

I am trying to scrap a html page, this is very basic code. But still I am getting error when I use prettify()

error is

UnicodeEncodeError: 'charmap' codec can't encode character u'\xa9' in position 7

Upvotes: 0

Views: 106

Answers (1)

Nolan Conaway
Nolan Conaway

Reputation: 2767

This is a common problem. The issue probably isn't with your code, but with whatever console you're printing to. Beautifulsoup uses a unicode encoding, which a lot of editors don't always play nice with (for example, i get this error a lot when I print a soup in Sublime Text). Encoding the string to another format (UTF-8, ascii) should do the trick.

print soup.prettify().encode('utf-8')

I haven't tested, that may just fix it for you.

Upvotes: 2

Related Questions