Reputation: 185
I have:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from urllib2 import urlopen
page2 = urlopen('http://pogoda.yandex.ru/moscow/').read().decode('utf-8')
page = urlopen('http://yasko.by/').read().decode('utf-8')
And in line "page ..." I have error "UnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 32: invalid continuation byte", but in line "page2 ..." th error is not, why?
From a position of 32 in yasko.by starts Cyrillic symbols, how I get it correctly?
Thanks!
Upvotes: 2
Views: 215
Reputation: 368924
The content of http://yasko.by/ is encoded with windows-1251
, while the content of http://pogoda.yandex.ru/moscow/ is encoded with utf-8
.
page = ..
line should become:
page = urlopen('http://yasko.by/').read().decode('windows-1251')
Upvotes: 2