user3822769
user3822769

Reputation: 151

u'囧'.encode('gb2312') throws UnicodeEncodeError

Firefox can display '囧' in gb2312 encoded HTML. But u'囧'.encode('gb2312') throws UnicodeEncodeError.

1.Is there a map, so firefox can lookup gb2312 encoded characters in that map, find 01 display matrix and display .

2.Is there a map for tranlating unicode to gb2312 but u'囧' is not in that map?

Upvotes: 1

Views: 318

Answers (2)

Bruno Haible
Bruno Haible

Reputation: 1292

When people or software says that something is GB2312 encoded, they most often mean that it is encoded in the GBK encoding, a.k.a. CP936 from Microsoft. GB2312 was a subset of GBK that was used in the 1980ies, but both are part of the same family of encodings.

Incidentally the forthcoming WhatWG's encoding specification recommends to treat any text labelled as "gb2312" as GBK encoded text.

Therefore, try u'囧'.encode('gbk') or u'囧'.encode('cp936') or u'囧'.encode('windows-936').

Upvotes: 3

ictxiangxin
ictxiangxin

Reputation: 51

囧 not in gb2312, use gb18030 instead. I guess firefox may extends encode method when she face unknown characters.

Upvotes: 3

Related Questions