Reputation: 3
Hello I have trouble with replacing all texts from HTML. I wanted to make a censure with BeautifulSoup but it doesn't replace a content and I got error when I print contents (not all texts from HTML were printed)
words = ['Shop','Car','Home','Generic','Elements']
page = urllib.urlopen("html1/index.html").read()
soup = BeautifulSoup(page, 'html.parser')
texts = soup.findAll(text=True)
for i in texts :
if i == words :
i = '***'
print i
Anyone know how to fix it?
Error :
Traceback (most recent call last):
File "replacing.py", line 28, in <module>
print i
File "F:\Python\Python27\lib\encodings\cp852.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u2019' in position 25: character maps to <undefined>
Upvotes: 0
Views: 115
Reputation: 51155
You have two major issues here. The first is an encoding issue, where you are trying to print a non-printable character. For that you can use answers found in:
UnicodeEncodeError: 'charmap' codec can't encode - character maps to <undefined>, print function
Or, for a more in depth explanation:
Python, Unicode, and the Windows console (Now that I look at this more it's probably outdated, but still an interesting read).
However, you also have a logic problem with your code.
if i == words:
This line doesn't check if i
is found in words, but instead compares i
to a list of words, which isn't what you want. I would recommend making the following changes:
words = {'Shop','Car','Home','Generic','Elements'}
for i in texts:
if i in words:
i = '***'
Converting words
to a set
allows for average O(1)
lookup, and using if i in words
checks if i
is found in words.
Upvotes: 2
Reputation: 91
It looks like one of the characters you are trying to print is not found in the codec python uses to print messages. I.e. you have the data for a character but you don't know what symbol it should be and so you can't print it. A simple conversion of the HTML to a unicode format should solve your problem.
Good question on how to do that:
Convert HTML entities to Unicode and vice versa
Upvotes: 0