ha22109
ha22109

Reputation: 8326

Japanese in python function

I wrote a function in Python which is used to tell me whether the two words are similar or not.

Now I want to pass Japanese text in my same function. It is giving error "not a ascii character." I tried using utf-8 encoding, but then it giving the same error

Non-ASCII character '\xe3' in file

Is there any way to do that? I cant generate the msg file for that since the 2 keyword will be not be constant.

Here goes the code

def filterKeyword(keyword, adText, filterType):
if (filterType == 'contains'):
    try :
        adtext = str.lower(adText)
        keyword = str.lower(keyword)
        if (adtext.find(keyword)!=-1):
            return '0'
    except:
        return '1'
if (filterType == 'exact'):
    var = cmp(str.lower(adText), str.lower(keyword))
    if(var == 0 ):
        return '0'

return '1'

I have used the following:

filterKeyword(unicode('ポケモン').encode("utf-8"), unicode('黄色のポケモン').encode("utf-8"), 'contains')

filterKeyword('ポケモン'.encode("utf-8"), '黄色のポケモン'.encode("utf-8"), 'contains')

Both of them are giving the error.

Upvotes: 0

Views: 826

Answers (5)

Joe Koberg
Joe Koberg

Reputation: 26719

I would just like to note well:

unicode('ポケモン') (a non-unicode string constant passed to the unicode() constructor)

IS NOT THE SAME AS

u'ポケモン' (a unicode string constant)

Upvotes: 0

S.Lott
S.Lott

Reputation: 391952

Please do not do this:

adtext = str.lower(adText)
keyword = str.lower(keyword)

Please do this:

adtext= adText.lower()
keyword = keyword.lower()

Please do not do this:

cmp(str.lower(adText), str.lower(keyword))

Please do this:

return adText.lower() == keyword.lower()

Please do not do this:

try:
    # something
except:
    # handler

Please provide a specific exception. A generic (superclass) like Exception is fine. There are some non-exception errors which you cannot meaningfully catch.

try:
    # something
except Exception:
    # handler

Also, it's really unlikely that catching an exception would return True.

Please do not do this:

return '1' 
return '0'

It's unlikely you want to return a character. It's more likely you want to return True or False.

return True
return False

Your code will work, if you do things properly.

>>> u'ポケモン'.lower() == u'黄色のポケモン'.lower()
False
>>> u'ポケモン'.lower() in  u'黄色のポケモン'.lower()
True

Upvotes: 1

Daniel Stutzbach
Daniel Stutzbach

Reputation: 76737

This worked for me:

# -*- coding: utf-8 -*-

def filterKeyword(keyword, adText, filterType):
    # same as yours

filterKeyword(u'ポケモン', u'黄色のポケモン', 'contains')

Upvotes: 3

Jacek Konieczny
Jacek Konieczny

Reputation: 8604

Put:

# -*- coding: utf-8 -*-

In one of the first two lines of your script. This way the interpreter will know what encoding is used for the code and strings in it.

And use Unicode strings wherever possible. If you have luck the function may work well with the Unicode (e.g. u"something…" instead of "something...") arguments even if it was not written with Unicode in mind.

Upvotes: 0

Ignacio Vazquez-Abrams
Ignacio Vazquez-Abrams

Reputation: 799092

Don't use UTF-8. Use unicodes.

Upvotes: 0

Related Questions