Reputation: 1389
Here's a snippet of code where a string is to be UTF-16 encoded and sent on the wire:
# -*- coding: utf8-*-
import unit_test_utils
import os
import sys
...
...
def run():
test_dir = unit_test_utils.get_test_dir("test")
try:
file_name = u'débárquér.txt'
open_req = createrequest.CreateRequest(factory)
open_req.create_disp_ = defines.FILE_OPEN_IF
open_req.file_name_ = '%s\\%s' % (test_dir, file_name)
res = unit_test_utils.test_send(client, open_req)
....
....
finally:
client.close()
if __name__ == '__main__':
run()
When this is run, the error is as follows:
# python /root/python/tests/unicode_test.py
Traceback (most recent call last):
File "/root/python/tests/unicode_test.py", line 47, in <module>
run()
File "/root/python/tests/unicode_test.py", line 29, in run
res = unit_test_utils.test_send(client, open_req)
File "/root/python/unit_test_utils.py", line 336, in test_send
handle_class=handle_class)
File "/root/python/unit_test_utils.py", line 321, in test_async_send
test_handle_class(handle_class, expected_status))
File "/root/usr/lib/python2.7/site-packages/client.py", line 220, in async_send
return self._async_send(msg, function, handle_class, pdu_splits)
File "/root/usr/lib/python2.7/site-packages/client.py", line 239, in _async_send
data, handle = self._handle_request(msg, function, handle_class)
File "/root/usr/lib/python2.7/site-packages/client.py", line 461, in _handle_request
return handler(self, msg, *args, **kwargs)
File "/root/usr/lib/python2.7/site-packages/client.py", line 473, in _common_request
msg.encode(buf, smb_ver=2)
File "/root/usr/lib/python2.7/site-packages/message.py", line 17, in encode
new_offset = composite.Composite.encode(self, buf, offset, **kwargs)
File "/root/usr/lib/python2.7/site-packages/pycifs/composite.py", line 36, in encode
new_offset = self._encode(buf, offset, **kwargs)
File "/root/usr/lib/python2.7/site-packages/packets/createrequest.py", line 128, in _encode
offset = self._file_name.encode(self._file_name_value(**kwargs), buf, offset, **kwargs)
File "/root/usr/lib/python2.7/site-packages/fields/unicode.py", line 76, in encode
buf.append(_UTF16_ENC(value)[0])
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 8: ordinal not in range(128)
What is wrong with the code?
When I tried this exercise locally, things seemed fine:
$ python
Python 2.6.6 (r266:84292, Jul 22 2015, 16:47:47)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-16)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> file_name = 'débárquér.txt'
>>> print type(file_name)
<type 'str'>
>>> utf16_filename = file_name.decode('utf8').encode('UTF-16LE')
>>> print type(utf16_filename)
<type 'str'>
>>> utf16_filename.decode('UTF-16LE')
u'd\xe9b\xe1rqu\xe9r.txt'
Upvotes: 0
Views: 2361
Reputation: 10450
try to replace:
utf16_filename = file_name.decode('utf8').encode('UTF-16LE')
with
utf16_filename = unicode(file_name.decode('utf8')).encode('UTF-16LE')
Upvotes: -1
Reputation: 177971
When working with Unicode text, convert incoming byte strings to Unicode as soon as you can, work with Unicode text in the script, then convert back to byte strings as late as you can.
You've got a mix of byte strings in different encodings and the likely cause of trouble is this line:
open_req.file_name_ = '%s\\%s' % (test_dir, utf16_filename)
It is unclear what encoding test_dir
is in, but the format string is an ASCII byte string, and utf16_filename
is a UTF-16LE-encoded byte string. The result will be a mix of encodings.
Instead, determine what test_dir
is, decode it to Unicode (if it is not), and use Unicode strings everywhere. Here's an example:
test_dir = unit_test_utils.get_test_dir("test")
# if not already Unicode, decode it...need to know encoding
test_dir = test_dir.decode(encoding)
file_name = u'débárquér.txt' # Unicode string!
open_req = createrequest.CreateRequest(factory)
open_req.create_disp_ = defines.FILE_OPEN_IF
# This would work...
# fullname = u'%s\\%s' % (test_dir, file_name)
# But better way to join is this...
fullname = os.path.join(test_dir,file_name)
# I assume UTF-16LE is required for "file_name_" at this point.
open_req.file_name_ = fullname.encode('utf-16le')
res = unit_test_utils.test_send(client, open_req)
Upvotes: 2
Reputation: 5751
Do not assign text to byte strings. In Python 2 that means you have to use unicode literals:
file_name = u'débárquér.txt' # <-- unicode literal
utf16_filename = file_name.encode('UTF-16LE')
Then make sure you accurately declare the encoding of your source file.
Upvotes: 2