Reputation: 286
SOLVED
I solved the problem, thanks all for your time.
First of all, these are the requirements:
So hello, I have a bot coded with python, and I would like to make it compare 2 non-English letters (unicode).
The problem I have is, the letters MUST be within variables, so I can't use:
u'letter'
Both letters I would like to compare MUST be within variables.
I have tried:
letter1 == letter2
it's showing this error: E:\bots\KiDo\KiDo.py:23: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal import sys
and always returns False even the 2 letters are the same. So I guess it means I'm comparing 2 unicode letters.
And tried:
letter = unicode(letter)
but it shows this error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd9 in position 0: ordinal not in range(128)
I have searched all over Google, but all I could find is using u' ', but this won't work with the variables.
Thank you.
Comparison Code:
word1 = parameters.split()[0]
word2 = parameters.split()[1]
word3 = parameters.split()[2]
word4 = parameters.split()[3]
word5 = parameters.split()[4]
if word1[0] == letter:
if word2[0] == letter:
if word3[0] == letter:
if word4[0] == letter:
if word5[0] == letter:
reply(type, source,u'True')
Upvotes: 2
Views: 17712
Reputation: 177620
I think you don't understand Unicode vs. an encoding.
Refer to this article: http://www.joelonsoftware.com/articles/Unicode.html
Note the following... UTF-8 is an encoding of Unicode, but is not Unicode. The # coding: utf-8
declaration at the top of the source below declares the encoding of the source file as saved on disk. a = u'ç'
declares a Unicode variable. b = 'ç'
is a byte string in the source encoding (utf-8).
Note that repr
displays different source-like representation of the string so you can tell the difference. type
indicates the object type.
# coding: utf-8
a = u'ç'
b = 'ç'
print a
print b
print repr(a)
print repr(b)
print type(a)
print type(b)
print a==b # Not comparing same types.
print a==b.decode('utf8') # Comparing both as Unicode strings.
print a.encode('utf8')==b # Comparing both as byte strings.
a
and b
print the same, but are not the same:
ç
ç
u'\xe7'
'\xc3\xa7'
<type 'unicode'>
<type 'str'>
C:\Users\metolone\Desktop\Script1.py:11: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
print a==b
False
True
True
Your letter1
and letter2
are two different types of strings.
Here's a complete example reading a word list from a file and taking input from a user:
import sys
import codecs
# The word list was saved in UTF-8 encoding. It can be in any encoding
# as long as the correct one is specified when reading it in.
# `codecs.open` will convert the input to Unicode.
with codecs.open('words.txt','r',encoding='utf8') as f:
word_list = f.read().strip().splitlines()
print 'word_list and type:',word_list,type(word_list[0])
# Different consoles can have different input encodings. Let's see what it is.
print 'My terminal encoding:',sys.stdin.encoding
# Read a word in the input encoding. We'll convert to Unicode later.
word = raw_input('Word? ')
print 'word, content and type:',word,repr(word),type(word)
# Now decode the input to Unicode.
word = word.decode(sys.stdin.encoding)
print 'converted word, content and type:',word,repr(word),type(word)
# Compare the two Unicode strings
print 'Comparison:',word in word_list
Output from US Windows console. Note that different consoles have different encodings. Linux is usually UTF-8. Non-US Windows console's can be different.
word_list and type: [u'\ufeffadi\xf3s', u'ping\xfcino'] <type 'unicode'>
My terminal encoding: cp437
Word? pingüino
word, content and type: pingüino 'ping\x81ino' <type 'str'>
converted word, content and type: pingüino u'ping\xfcino' <type 'unicode'>
Comparison: True
Upvotes: 0
Reputation: 28370
If you need to compare single letters you could always compare the actual value using ord(a)==ord(b)
.
In answer to the example posted:
>>> def check(b):
... a = u'ي'
... return (b==a, ord(a), ord(b), ord(a)==ord(b))
...
>>> check(u'ي')
(True, 1610, 1610, True)
>>>
You do need to be consistent in marking unicode as unicode, i.e. putting the u
before the quotes.
Upvotes: -1
Reputation: 1319
Look, the letter ç (a char that is not presented in ASCII) may be represented as a str object or as an unicode object (maybe you are a little confused about what unicode means).
Also, if you are trying to create an unicode object that is not present in ASCII table, you must pass another encoding table:
unicode('ç')
This will raise an UnicodeDecodeError because 'ç' is not in ASCII, but
unicode('ç', encoding='utf-8')
will work, because 'ç' is presented in UTF-8 encoding table (as your Arabic letters may be).
You can compare unicode objects with unicode objects as the same way you can compare str objects with str objects, and all this must work fine.
Also, you can compare a str object with unicode object but this is error prone if you are comparing not ASCII characters: 'ç' as a str is '\xc3\xa7' but as unicode it is just '\xe7' (returning False in a comparison).
So @Karsa may be really right. The problem is with your 'variables' (in Python, a better word is objects). You must certify that you are comparing just str or just unicode objects.
So, a better code could be:
#-*- coding: utf-8 -*-
def compare_first_letter(phrase, compare_letter):
# making all unicode objects, with utf-8 codec
compare_letter = unicode(compare_letter,encoding='utf-8')
phrase = unicode(phrase,encoding='utf-8')
# taking the first letters of each word in phrase
first_letters = [word[0] for word in phrase.split()]
# comparing the first letters with the letter you want
for letter in first_letters:
if letter != compare_letter:
return False
return True # or your reply function
letter = 'ç'
phrase_1 = "one two three four"
phrase_2 = "çarinha çapoca çamuca"
print(compare_first_letter(phrase_1,letter))
print(compare_first_letter(phrase_2,letter))
Upvotes: 3
Reputation: 107287
this is my try base on any thing you say :
>>> b=u'letter'
>>> a=u'letter'
>>> a==b
True
>>> a=u'letter2'
>>> a==b
False
so im sure that there is a problem with your variables ! i suggest before you compare them try to print them ! to see whats under the variables !
Upvotes: 0