Reputation: 2790
I've got a variable:
age_expectations = dictionary['looking_for']['age']
print type(age_expectations), age_expectations
The output is:
<type 'unicode'> 22‑35
When I'm trying to split it with the dash I've got the following problem:
res = age_expectations.split('-')
print res
And the output look like:
[u'22\u201135']
Instead of:
["22", "35"]
What is the problem? I've tried many encoding and decoding but not really sure to understand how it's work. Does the problem come from the split?
Upvotes: 2
Views: 1658
Reputation: 77912
As you can see from your code, the hyphen in your age_expectations
variable is the unicode U+2011 character, not the standard "-" hyphen. You would have seen it from the start if you had printed the variable's representation instead:
>>> uu = u"22\u201135"
>>> print uu
22‑35
>>> print repr(uu)
u'22\u201135'
>>>
So you need to either replace the u"\u2011"
character with a simple hyphen (if you can have any of them in your data) or just simply split the string on u"\u2011"
(if you're sure you'll always get this as delimiter).
Upvotes: 1
Reputation: 9345
Use unicode
to split the unicode
like,
>>> u_code = u'\u0032\u0032\u2011\u0033\u0035'
>>> print u_code
22‑35
>>> u_code.split('-')
[u'22\u201135']
>>> u_code.split(u'\u2011')
[u'22', u'35']
>>>
Upvotes: 2