dijxtra
dijxtra

Reputation: 2751

convert string representation of unicode to unicode

This piece of python 2.7 code first correctly prints "1", but then throws "ValueError: invalid literal for int() with base 10: ''".

num = '\x001\x00'
print num
print int(num)

I guess the problem is that type(num) == <type 'str'>, so I in fact do not have an unicode string for "1", but an ascii string which contains unicode representation of a string "1". Did I get that right?

Anyway, how do I convert num to a format which int() will recognise?

Upvotes: 0

Views: 140

Answers (2)

user4815162342
user4815162342

Reputation: 154916

The code appears to print 1 correctly because your terminal ignores the binary zeros you are printing before and after the 1.

To convert the string to number correctly, you first need to know the format of the string. For example, if the format is such that the textual representation of the number is surrounded with binary zeros, then you can convert it with the code from Martijn's answer. Otherwise, the struct module is a useful general tool for such conversions.

Upvotes: 1

Martijn Pieters
Martijn Pieters

Reputation: 1121914

The \x00 bytes are the problem here, not unicode vs. string values. You can strip those off:

int(num.strip('\x00'))

int() only accepts strings containing digits, with perhaps a decimal point, sign (+ or -) and surrounding whitespace. NULL bytes are not whitespace, even if your terminal ignores them when printing.

Upvotes: 4

Related Questions