Reputation: 63
I am creating a REST API using Flask-Python. One of the urls (/uploads) takes in (a POST HTTP request) and a JSON '{"src":"void", "settings":"my settings"}'. I can individually extract each object and encode to a byte string which can then be hashed using hashlib in python. However, my goal is to take the whole string and then encode so it looks like...myfile.encode('utf-8'). Printing myfile displays as follows >> {u'src':u'void', u'settings':u'my settings'}, is there anyway I can take the above unicoded string then encode to utf-8 to a sequence of bytes for hashlib.sha1(mayflies.encode('uff-8'). Do let me know for more clarification. Thanks in advance.
fileSRC = request.json['src']
fileSettings = request.json['settings']
myfile = request.json
print myfile
#hash the filename using sha1 from hashlib library
guid_object = hashlib.sha1(fileSRC.encode('utf-8')) // this works however I want myfile to be encoded not fileSRC
guid = guid_object.hexdigest() //this works
print guid
Upvotes: 3
Views: 5563
Reputation: 4771
As you said in comments, you solved your issue using:
jsonContent = json.dumps(request.json)
guid_object = hashlib.sha1(jsonContent.encode('utf-8'))
But it's important to understand why this works. Flask sends you unicode()
for non-ASCII, and str()
for ASCII. Dumping the result using JSON will give you consistent results since it abstracts away the internal Python representation, just as if you only had unicode()
.
In Python 2 (the Python version you're using), you don't need .encode('utf-8')
because the default value of ensure_ascii
of json.dumps()
is True
. When you send non-ASCII data to json.dumps()
, it will use JSON escape sequences to actually dump ASCII: no need to encode to UTF-8. Also, since the Zen of Python says that "Explicit is better than implicit", even if ensure_ascii
is already True
, you could specify it:
jsonContent = json.dumps(request.json, ensure_ascii=True)
guid_object = hashlib.sha1(jsonContent)
In Python 3 however, this would no longer work. Inded, json.dumps()
returns unicode
in Python 3, even if everything in the unicode
string is ASCII. But hashlib.sha1
only works on bytes
. You need to make the conversion explicit, even if the ASCII encoding is all you need:
jsonContent = json.dumps(request.json, ensure_ascii=True)
guid_object = hashlib.sha1(jsonContent.encode('ascii'))
This is why Python 3 is a better language: it forces you to be more explicit about the text you use, whether it is str
(Unicode) or bytes
. This avoids many, many problems down the road.
Upvotes: 1