Reputation: 45

Unicode Encode Error: 'ascii' codec can't encode character u'\u2019'

I'm trying to read html file but when sourcing out for the titles and urls to compare with my keyword 'alist' I get this error Unicode Encode Error: 'ascii' codec can't encode character u'\u2019'. Error in link(http://tinypic.com/r/307w8bl/8)

Code

for q in soup.find_all('a'):
    title = (q.get('title'))
    url = ((q.get('href')))
    length = len(alist)
    i = 0
    while length > 0:
        if alist[i] in str(title): #checks for keywords from html form from the titles and urls
            r.write(title)
            r.write("\n")
            r.write(url)
            r.write("\n")
        i = i + 1
        length = length -1
doc.close()
r.close()

A little background. alist contains a list of keywords which I would use to compare it with title so as to get what I want. The strange thing is if alist contains 2 or more words, it would run perfectly but if there was only one word, the error as seen above would appear. Thanks in advance.

Upvotes: 3

Answers (3)

xecgr

Reputation: 5193

If your list MUST BE a string list, try to encode title var

>>> alist=['á'] #asci string
>>> title = u'á' #unicode string
>>> alist[0] in title
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
>>> title and alist[0] in title.encode('utf-8')
True
>>>

Upvotes: 3

Neel

Reputation: 21315

The problem is in str(title). U are trying to convert unicode data to string.

Why u are converting title to string? You can direct access it.

soup.find_all will return you list of strings.

Upvotes: 0

RemcoGerlich

Reputation: 31270

Presumably, title is a Unicode string that can contain any kind of character; str(title) tries to turn it into a bytestring using the ASCII codec, but that fails because your title contains a non-ASCII character.

What are you trying to do? Why do you need to turn the title into a bytestring?

Upvotes: 0

Unicode Encode Error: &#39;ascii&#39; codec can&#39;t encode character u&#39;\u2019&#39;

Answers (3)

Related Questions

Unicode Encode Error: 'ascii' codec can't encode character u'\u2019'