A Confusing Case in Python

Question

Im new to pyqt and there is something i didnt get.I made an app using hashlib and of course gui with pyqt.

 self.pushButton.connect(self.pushButton,QtCore.SIGNAL("clicked()"),self.clickedButton)

and click button:

def clickedButton(self):
        if self.comboBox.currentText() == "MD5":
            self.MD5(self.lineEdit.text())

and MD5:

 def MD5(self,text):
        self.hash = hashlib.md5(text).hexdigest()
        self.textEdit.setText(self.hash)

the result for "hello": a6f145a01ad0127e555c051d15806eb5

No error.It looks ok.But trying same thing on python shell:

 >>> print hashlib.md5("hello").hexdigest()
 5d41402abc4b2a76b9719d911017c592
 >>>

is this an error or why am i getting different results?

Matteo Italia · Accepted Answer

The problem is that you are passing to md5 a QString, not a regular Python string. From some experimentation, I can see that both a regular string and a unicode string in Python produce the same result - from here i can tell that it tries to convert the unicode version to a sequence of "narrow" characters using the ascii codec.

>>> print hashlib.md5("hello").hexdigest()
5d41402abc4b2a76b9719d911017c592
>>> print hashlib.md5(u"hello").hexdigest()
5d41402abc4b2a76b9719d911017c592

The QString, instead, gets hashed differently, as in your program:

>>> a=PyQt4.QtCore.QString("hello")
>>> print hashlib.md5(a).hexdigest()
a6f145a01ad0127e555c051d15806eb5

although its UTF-8 or Latin 1 representation (both of which are the same as the output of the ascii codec, since we are dealing only with alphabetic characters) are hashed the same way as Python strings:

>>> print hashlib.md5(a.toUtf8()).hexdigest()
5d41402abc4b2a76b9719d911017c592
>>> print hashlib.md5(a.toLatin1()).hexdigest()
5d41402abc4b2a76b9719d911017c592

Probably what's happening here is that the hashing algorithm is working on the internal representation of the QString, which is UTF-16, which obviously differs from the UTF-8 representation, producing different outputs.

Thus, the lesson here is that before performing any hashing on text you have to choose explicitly its encoding before passing it to the hashing function - which works just with bytes - since there ain't no such thing as plain text.

Edit: it's not working on the UTF-16 representation (otherwise you would get the same result with

>>> print hashlib.md5(u'hello'.encode('utf_16')).hexdigest()
25af7f84a93a6cf5cb00967c60910c7d

) but on something else; still, the point is that hashlib isn't thought to work with QString, so it produces some "strange" output. Again, before using it convert the QString to a narrow string in some adequate encoding.

A Confusing Case in Python

Answers (1)

Related Questions