Wason
Wason

Reputation: 1493

For Python 3 program can't display Chinese charactor

I'm trying a simple python exercise. The code snippet is from this site and open source. The goal is parsing a web page and extract some text in the page. The program is like below, using python3 and redirected the output to a file. But the file didn't hold correct information I want, that is, it didn't show Chinese characters, instead with unicode like "\u514d\u8d39\u4e0b\u8f7d". How can I do it correctly?

import sys, urllib.request 
import traceback
from bs4 import BeautifulSoup
url = "http://appstore.huawei.com/more/all"

def uprint(*objects, sep=' ', end='\n', file=sys.stdout):
    enc = file.encoding
    if enc == 'UTF-8':
        print(*objects, sep=sep, end=end, file=file)
    else:
        f = lambda obj: str(obj).encode(enc, errors='backslashreplace').decode(enc)
        print(*map(f, objects), sep=sep, end=end, file=file)

def crawl():
    req = urllib.request.Request( url )
    req.add_header('User-Agent', 'PyCrawler 0.2.0')
    data = urllib.request.urlopen(req).read()
    soup = BeautifulSoup(data, 'lxml')  
    items_entry = soup.find_all( class_="list-game-app dotline-btn nofloat")    
    for item in items_entry:        
        title_tag = item.find_all("h4", class_="title")
        for title in title_tag:
            title_A = item.find_all("a")
            for title_a_item in title_A:
                output = str(title_a_item.string)                
                uprint(output)
    print(u"Finishing...")

if __name__ == "__main__":
    crawl()

Upvotes: 1

Views: 878

Answers (1)

Uriel
Uriel

Reputation: 16174

Your cmd font probably does not support utf-8 encoding (more specifically, Chinese characters), so it uses utf sequences to show them.

You can either look for a font that does support (you can change fonts from setting, by clicking the icon of the cmd), or use python's IDLE that shows utf-8 characters.

Upvotes: 1

Related Questions