Peter T251
Peter T251

Reputation: 41

Bytes string in Python

Would you know by any chance how to get rid on the bytes identifier in front of a string in a Python's list, perhaps there is some global setting that can be amended?

I retrieve a query from the Postgres 9.3, and create a list form that query. It looks like Python 3.3 interprets records in columns that are of type char(4) as if the they are bytes strings, for example:

Funds[1][1]
b'FND3'
Funds[1][1].__class__
<class 'bytes'>

So the implication is:

Funds[1][1]=='FND3'
False

I have some control over that database so I could change the column type to varchar(4), and it works well:

Funds[1][1]=='FND3'
True

But this is only a temporary solution. The little b makes my life a nightmare for the last two days ;), and I would appreciate your help with that problem.

Thanks and Regards Peter

Upvotes: 1

Views: 803

Answers (2)

abarnert
abarnert

Reputation: 366203

The b isn't part of the string, any more than the quotes around it are; they're just part of the representation when you print the string out. So, you're chasing the wrong problem, one that doesn't exist.

The problem is that the byte string b'FND3' is not the same thing as the string 'FND3'. In this particular example, that may seem silly, but if you might ever have any non-ASCII characters anywhere, it stops being silly.

For example, the string 'é' is the same as the byte string b'\xe9' in Latin-1, and it's also the same as the byte string b'\xce\xa9' in UTF-8. And of course b'\xce\a9' is the same as the string 'é' in Latin-1.

So, you have to be explicit about what encoding you're using:

Funds[1][1].decode('utf-8')=='FND3'

But why is PostgreSQL returning you byte strings? Well, that's what a char column is. It's up to the Python bindings to decide what to do with them. And without knowing which of the multiple PostgreSQL bindings you're using, and which version, it's impossible to tell you what to do. But, for example, in recent-ish psycopg, you just have to set an encoding in the connection (e.g., conn.set_client_encoding('UTF-8'); in older versions you had to register a standard typecaster and do some more stuff; etc.; in py-postgresql you have to register lambda s: s.decode('utf-8'); etc.

Upvotes: 1

Veedrac
Veedrac

Reputation: 60237

You have to either manually implement __str__/__repr__ or, if you're willing to take the risk, do some sort of Regex-replace over the string.

Example __repr__:

def stringify(lst):
    return "[{}]".format(", ".join(repr(x)[1:] if isinstance(x, bytes) else repr(x) for x in lst))

Upvotes: 1

Related Questions