Edijs Petersons
Edijs Petersons

Reputation: 567

Encoding and decoding in postgresql

Let's say we have a string 'a\b'. I need to encode it first, save to file, then read it from file and puck back in db.

How to encode and decode text that has escape characters?

select encode(E'a\b'::bytea, 'base64')
"YQg="
select decode('YQg=', 'base64')
"a\010"

After decoding I am not getting back string as it was in it's original form.

Upvotes: 6

Views: 33733

Answers (1)

Craig Ringer
Craig Ringer

Reputation: 324265

You're using an E'' string (escape string) and casting to bytea. The result will be a representation of that string in your current database encoding - probably UTF-8.

E'a\b' is the character a then the character represented by the escape \b which is ordinal \x08. PostgreSQL represents this string with a hex-escape when printing to the terminal because it's a non-printable character. The string is still two characters long.

postgres=> SELECT E'a\b';
 ?column? 
----------
 a\x08
(1 row)

postgres=> SELECT length(E'a\b');
 length 
--------
      2
(1 row)

The cast to bytea implicitly does a conversion to the current database encoding:

postgres=> SELECT E'a\b'::bytea;
 bytea  
--------
 \x6108
(1 row)

(\x61 is the ASCII ordinal for a in most encodings).

Except you must be on an old PostgreSQL since you have bytea_output = escape, resulting in octal escape output instead:

postgres=> SELECT E'a\b'::bytea;
 bytea 
-------
 a\010
(1 row)

You need to decode the bytea back into a text string, e.g.

convert_from(decode('YQg=', 'base64'), 'utf-8')

... and even then the nonprintable character \b will be printed as \x08 by psql. You can verify that it is really that character inside the database using another client.

BTW, what's going on would be clearer if you instead explicitly encoded it when you stored it rather than relying on a cast to bytea:

encode(convert_to(E'a\b', 'utf-8'), bytea)

Upvotes: 11

Related Questions