Reputation: 531
I bet this would be a stupid question, here I am.
I'm working on Fedora 21.
From a database I receive the string:
16 de enero de 1979 – 25 de agosto de 2001
and what I want is to split the string using the '-' in the middle.
So I do the following:
text = '16 de enero de 1979 – 25 de agosto de 2001'
Python 2.7.8:
text
# returns: '16 de enero de 1979 \xe2\x80\x93 25 de agosto de 2001'
text.split('-')
# returns ['16 de enero de 1979 \xe2\x80\x93 25 de agosto de 2001']
Python 3.4.
text
# returns: '16 de enero de 1979 – 25 de agosto de 2001'
text.split('-')
#returns: ['16 de enero de 1979 – 25 de agosto de 2001']
And I do know that the default encoding for Python 3.x is utf-8 and ascii for Python 2.x. The thing is that I never understood perfectly well, how the hell can we handle all this encoding things.
When I stored this information in my database, I used charset='utf-8'
just to make sure I do not have this kind of hassle. And know I'm retrieving the information, Python is not handling well the encoding. Or... I'm not handling well Python (most probably).
Thank in advance.
Upvotes: 0
Views: 1140
Reputation: 414355
from __future__ import unicode_literals
at the top or call: text = utf8bytes.decode('utf-8')
u'\N{EN DASH}'
>>> u'16 de enero de 1979 – 25 de agosto de 2001'.split(u'\N{EN DASH}')
[u'16 de enero de 1979 ', u' 25 de agosto de 2001']
Upvotes: 0
Reputation: 1857
It is not a normal hyphen. It is a unicode character \u2013
. I tried something like this:
In [70]: text.split('\u2013')
Out[70]: [u'16 de enero de 1979 \u2013 25 de agosto de 2001']
In [71]: text.split(u'-')
Out[71]: [u'16 de enero de 1979 \u2013 25 de agosto de 2001']
In [72]: text.split(u'–') #HERE i copied the character from string
Out[72]: [u'16 de enero de 1979 ', u' 25 de agosto de 2001']
In your case its not working coz its not finding -
(hyphen).
Upvotes: 1
Reputation: 194
The error is that you are splitting on the wrong character.
The character in your original string is a long dash '–' while the character in your split argument is a short dash '-'.
Upvotes: 1