Reputation: 10570
I am extracting some values from a website, and when I just take the text as it is, i get this results:
u'Used Car for Sale \xa0\xa0 - \xa0'
notice the u
but when I do .encode("utf-8")
i got this value:
'Used Car for Sale \xc2\xa0\xc2\xa0 - \xc2\xa0'
notice there is no u
are these two values the same?
I want to store the value, which one should I store please?
Upvotes: 0
Views: 89
Reputation: 4379
In python 2, they both inherit from basestring
but they are not of the same type, one is unicode
and the other str
. So not comparable and not the same.
Unless you are using python 3 in which strings are unicode by default, the following is true:
u'Used Car for Sale \xa0\xa0 - \xa0' == 'Used Car for Sale \xa0\xa0 - \xa0'
but the following is not:
u'Used Car for Sale \xa0\xa0 - \xa0' == 'Used Car for Sale \xa0\xa0 - \xa0'.encode('utf-8')
since the encoded one's type is bytes, so again not comparable.
I would say how you store it depends on a number of reasons. Maybe you want to preserve the text exactly as you received it or you want to clean it up before displaying it somewhere where these encodings don't matter or add noise, i.e. replace \xa0
with spaces etc.
Also, check out this excellent answer as it explains in detail their difference - maybe that helps you reach a decision: Python str vs unicode types
Upvotes: 1
Reputation: 231
The strings are actually of different types (unicode
and str
respectively), so they're not the same.
As for storage, that depends on where and how you are going to do it, but it's ultimately going to have to be encoded somehow (and obviously decoded when retrieving it).
Upvotes: 0