Tanay Mishra
Tanay Mishra

Reputation: 57

using request and beautiful soup module in python

My code is very short code but it is giving unexpected error

import bs4

url = "https://www."+input("Enter The Name of the website: ")+".com"
req = re.get(url)
html_text = req.text
htmls = bs4.BeautifulSoup(html_text, "html.parser").prettify()
with open("facebook.html", "w+") as file:
    file.write(htmls)
Traceback (most recent call last):
  File "C:\Users\Tanay Mishra\PycharmProjects\First_Project\Web Scraping\web.py", line 9, in <module>
    file.write(htmls)
  File "C:\Users\Tanay Mishra\AppData\Local\Programs\Python\Python38-32\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u200b' in position 756: character maps to <undefined>```

Upvotes: 0

Views: 200

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195418

\u200b means ZERO WIDTH SPACE in Unicode. Try to specify encoding="utf-8" in open() function. Also good practice is to use .content property of Response object and let BeautifulSoup to guess the encoding:

import bs4
import requests


url = "https://www."+input("Enter The Name of the website: ")+".com"
req = requests.get(url)
htmls = bs4.BeautifulSoup(req.content, "html.parser").prettify()
with open("facebook.html", "w+", encoding="utf-8") as file:
    file.write(htmls)

Upvotes: 1

Related Questions