Chang Dae Hyun
Chang Dae Hyun

Reputation: 35

handling unicode in python2

#!/usr/bin/env python
 # -*- coding: utf-8 -*-
 import tweepy
 import json
 import re
import sys
reload(sys)
sys.setdefaultencoding('utf-8')

class listener(StreamListener):

def on_data(self, data):
    try:
        print data
        tweet = data.split(',"text":"')[1].split('","source')[0]
        print tweet
        saveThis = str(time.time())+'::' + tweet
        saveFile = open("tweetDB3.csv", "a")
        saveFile.write(saveThis)
        saveFile.write("\n")
        saveFile.close()
        return True

    except BaseException, e:
        print "failed ondata,",str(e)
        time.sleep(5)

def on_error(self, status):
    print status

auth = OAuthHandler(ckey, csecret)
auth.set_access_token(atoken, asecret)
twitterStream = Stream(auth, listener())

twitterStream.filter(track = ['오늘'])

example of result:

1465042178.01::RT @BTS_twt: korea#\ud83c\uddf0\ud83c\uddf7 https://t.co/zwKaGo4Lcj 1465042181.76::RT @wdfrog: \ud5e4\ub7f4\ub4dc \uacbd\uc81c\uac00 \uc774\ubc88 \uc77c\ub85c \uc0ac\uacfc\ubb38\uc744 \uc62c\ub838\uc9c0\ub9cc \uc774\uc790\ub4e4\uc740 \ubd88\uacfc 3\uac1c\uc6d4 \uc804\uc778 3\uc6d4 4\uc77c\uc5d0\ub3c4 \uc55e\uc73c\ub85c \uc870\uc2ec\ud558\uaca0\ub2e4\ub294 \uc0ac\uacfc\ubb38\uc744 \uc62c\ub9b0 \ubc14 \uc788\ub2e4. \uc77c\uc774 \ucee4\uc9c8\uae4c \uba74\ud53c\ud558\ub294 \uac83\uc774\ub2c8 \uc5b8\ub860\uc911\uc7ac\uc704\uc5d0 \ud55c\uce35 \uac00\uc5f4\ucc28\uac8c \ubbfc\uc6d0\uc744 \ub123\uc74d\uc2dc\ub2e4\nhttps://t.co/Wb\u2026

Question:

If I do a twitter API stream through the above code (using Korean characters) the message above is what is being created in excel file which is shown as unicode.
These unicodes have corresponding Korean characters that can be found by print u'string'
But is it possible to make all these unicodes automatically converted Korean? I've tried to fix python code and tried to solve within excel but no luck.

Upvotes: 0

Views: 67

Answers (1)

Alexis Benichoux
Alexis Benichoux

Reputation: 800

Despite the setdefaultencoding method you can't change default encoding in python 2.7. You should use python 3, (default encoding is UTF-8 and you can change it)

Upvotes: 1

Related Questions