Why encode does not always work?

Question

I have a Python code that tries to read RSS sources written in Cyrillic letters (for example Russian). This is the code that I use:

import feedparser
from urllib2 import Request, urlopen

d=feedparser.parse(source_url)

# Make a loop over the entries of the RSS feed.
for e in d.entries:
    # Get the title of the news.
    title = e.title
    title = title.replace(' ','%20')
    title = title.encode('utf-8')

    # Get the URL of the entry.
    url = e.link
    url = url.encode('utf-8')


    # Make the request. 
    address = 'http://example.org/save_link.php?title=' + title + '&source=' + source_name + '&url=' + url

    # Submit the link.
    req = Request(address)
    f = urlopen(req)

I use encode('utf-8') since the titles are given in Cyrillic letters and it works fine. An example of the RSS source is here. The problem appears when I try to read the list of the RSS sources from another URL. In more details, there is a web-page that contains a list of RSS sources (URL of the sources as well as their names given in Cyrillic letters). An example of the list is here:







ua, Корреспондент, http://k.img.com.ua/rss/ua/news.xml
ua, Українська Правда, http://www.pravda.com.ua/rss/

The problem appears when I try to apply encode('utf-8') to the Cyrillic letters given in this document. I get an UnicodeDecodeError. Does anybody knows why?

ecatmur · Accepted Answer

encode will only give UnicodeDecodeError if you supply it a str object which it then tries to decode to unicode; see http://wiki.python.org/moin/UnicodeDecodeError.

You need to decode the str object to unicode first:

name = name.decode('utf-8')

This will take a str in UTF-8 encoding and give you a unicode object.

It works for the code that you posted because feedparser returns feed data already decoded to unicode.

Why encode does not always work?

Answers (1)

Related Questions