CODEWITHSUNDEEP

pythonpython-2.7

Marco Dinatsoli

Reputation: 10570

are these two values the same?

I am extracting some values from a website, and when I just take the text as it is, i get this results:

u'Used Car for Sale \xa0\xa0 - \xa0'

notice the u

but when I do .encode("utf-8")

i got this value:

'Used Car for Sale \xc2\xa0\xc2\xa0 - \xc2\xa0'

notice there is no u

are these two values the same?

I want to store the value, which one should I store please?

Upvotes: 0

Views: 89

Answers (2)

fips

Reputation: 4379

In python 2, they both inherit from basestring but they are not of the same type, one is unicode and the other str. So not comparable and not the same.

Unless you are using python 3 in which strings are unicode by default, the following is true:

u'Used Car for Sale \xa0\xa0 - \xa0' == 'Used Car for Sale \xa0\xa0 - \xa0'

but the following is not:

u'Used Car for Sale \xa0\xa0 - \xa0' == 'Used Car for Sale \xa0\xa0 - \xa0'.encode('utf-8')

since the encoded one's type is bytes, so again not comparable.

I would say how you store it depends on a number of reasons. Maybe you want to preserve the text exactly as you received it or you want to clean it up before displaying it somewhere where these encodings don't matter or add noise, i.e. replace \xa0 with spaces etc.

Also, check out this excellent answer as it explains in detail their difference - maybe that helps you reach a decision: Python str vs unicode types

Upvotes: 1

odnamrataizem

Reputation: 231

The strings are actually of different types (unicode and str respectively), so they're not the same.

As for storage, that depends on where and how you are going to do it, but it's ultimately going to have to be encoded somehow (and obviously decoded when retrieving it).

Upvotes: 0

Related Questions