Thanatos
Thanatos

Reputation: 44354

bytea type & nulls, Postgres

I'm using a bytea type in PostgreSQL, which, to my understanding, contains just a series of bytes. However, I can't get it to play well with nulls. For example:

=# select length(E'aa\x00aa'::bytea);
 length
--------
      2
(1 row)

I was expecting 5. Also:

=# select md5(E'aa\x00aa'::bytea);
               md5
----------------------------------
 4124bc0a9335c27f086f24ba207a4912
(1 row)

That's the MD5 of "aa", not "aa\x00aa". Clearly, I'm Doing It Wrong, but I don't know what I'm doing wrong. I'm also on an older version of Postgres (8.1.11) for reasons outside of my control. (I'll see if this behaves the same on the latest Postgres as soon as I get home...)

Upvotes: 3

Views: 11108

Answers (2)

Marcello Nuccio
Marcello Nuccio

Reputation: 3901

You can use regular strings or dollar-quoted strings instead of escaped strings:

# select length('aa\000aa'::bytea);
 length 
════════
      5
(1 row)

# select length($$aa\000aa$$::bytea);
length 
════════
      5
(1 row)

I think that dollar-quoted strings are a better option because, if the configuration parameter standard_conforming_strings is off, then PostgreSQL recognizes backslash escapes in both regular and escape string constants. However, as of PostgreSQL 9.1, the default is on, meaning that backslash escapes are recognized only in escape string constants.

Upvotes: 0

leonbloy
leonbloy

Reputation: 76026

Try this:

# select length(E'aa\\000aa'::bytea);
 length
--------
      5

Updated: Why the original didn't work? First, understand the difference between one slash and two:

pg=# select E'aa\055aa', length(E'aa\055aa') ;
 ?column? | length
----------+--------
 aa-aa    |      5
(1 row)

pg=# select E'aa\\055aa', length(E'aa\\055aa') ;
 ?column? | length
----------+--------
 aa\055aa |      8

In the first case, I'm writing a literal string, 4 characters unescaped('a') and one escaped. The slash is consumed by the parser in a first pass, which converts the full \055 to a single char ('-' in this case).

In the second case, the first slash just escapes the second, the pair \\ is translated by the parser to a single \ and the 055 is seen as three characters.

Now, when converting a text to a bytea, escape characters (in a already parsed or produced text) are parsed/interpreted again! (Yes, this is confusing).

So, when I write

 select E'aa\000aa'::bytea;

in the first parsing, the literal E'aa\000aa' is converted to an internal text with a null character in the third position (and depending on your postgresql version, the null character is interpreted as an EOS, and the text is assumed to be of length two - or in other versions an illegal string error is thrown).

Instead, when I write

 select E'aa\\000aa'::bytea;

in the first parsing, the literal string "aa\000aa" (eight characters) is seen, and is asigned to a text; then in the casting to bytea, it is parsed again, and the sequence of characters '\000' is interpreted as a null byte.

IMO postgresql kind of sucks here.

Upvotes: 9

Related Questions