Reputation: 57
My code is very short code but it is giving unexpected error
import bs4
url = "https://www."+input("Enter The Name of the website: ")+".com"
req = re.get(url)
html_text = req.text
htmls = bs4.BeautifulSoup(html_text, "html.parser").prettify()
with open("facebook.html", "w+") as file:
file.write(htmls)
Traceback (most recent call last):
File "C:\Users\Tanay Mishra\PycharmProjects\First_Project\Web Scraping\web.py", line 9, in <module>
file.write(htmls)
File "C:\Users\Tanay Mishra\AppData\Local\Programs\Python\Python38-32\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u200b' in position 756: character maps to <undefined>```
Upvotes: 0
Views: 200
Reputation: 195418
\u200b
means ZERO WIDTH SPACE in Unicode. Try to specify encoding="utf-8"
in open() function. Also good practice is to use .content
property of Response
object and let BeautifulSoup to guess the encoding:
import bs4
import requests
url = "https://www."+input("Enter The Name of the website: ")+".com"
req = requests.get(url)
htmls = bs4.BeautifulSoup(req.content, "html.parser").prettify()
with open("facebook.html", "w+", encoding="utf-8") as file:
file.write(htmls)
Upvotes: 1