user2350206
user2350206

Reputation: 185

Python 2.7 , issue with decode('utf-8')

I have:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from urllib2 import urlopen

page2 = urlopen('http://pogoda.yandex.ru/moscow/').read().decode('utf-8')

page = urlopen('http://yasko.by/').read().decode('utf-8')

And in line "page ..." I have error "UnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 32: invalid continuation byte", but in line "page2 ..." th error is not, why?

From a position of 32 in yasko.by starts Cyrillic symbols, how I get it correctly?

Thanks!

Upvotes: 2

Views: 215

Answers (1)

falsetru
falsetru

Reputation: 368924

The content of http://yasko.by/ is encoded with windows-1251, while the content of http://pogoda.yandex.ru/moscow/ is encoded with utf-8.

page = .. line should become:

page = urlopen('http://yasko.by/').read().decode('windows-1251')

Upvotes: 2

Related Questions