Peter Krauss
Peter Krauss

Reputation: 13940

Losing space in text conversion

This example, where intuition expects "foo bar", shows the strange behaviour:

SELECT 'foo'|| '#'::char ||'bar'          -- char ok, foo#bar
SELECT 'foo'|| '#' ||'bar'                -- literal ok 
SELECT 'foo'|| '#'::text ||'bar'          -- text ok 

SELECT 'foo'|| ' '::char ||'bar'          -- STRANGE! LOSTING SPACE!
SELECT ('foo'|| ' '::char ||'bar')='foobar' -- yes, it is true... strange

SELECT 'foo'|| ' '::text ||'bar'          -- text OK
SELECT 'foo'|| (' '::char)::text ||'bar'  -- char-to-text lost!
SELECT 'foo'|| ' ' ||'bar'                -- literal OK

Why does PostgreSQL do that? It is not intuitive, and seems error-prone behaviour.

PS: where does the PostgreSQL guide say (it needs a red alert) something about this?

Upvotes: 2

Views: 178

Answers (1)

JNevill
JNevill

Reputation: 50200

This is one of many reasons it's often recommended to stick with varchar() or, in Postgres text types instead.

SQL standard mandates that CHAR() values are padded with space to fill the remaining bytes. For instance:

'A'::CHAR(5)

Will result in "A " being stored. Now if have another field of different length, but same content:

'A'::CHAR10 = 'A'::CHAR(5)

We would want this to say TRUE, right? So the spaces added for padding to fill the CHAR() have to be trimmed.

This is less of a Postgres question then it is a disk storage or sql standard question. Something has to be written to those bytes on the disk and space is the standard. Some DB's only trim for comparison or conversion, and others, like Postgres, trim for nearly any function.

Since you are casting a space to a CHAR(1) (1 being the default length when none is specified, although it doesn't matter for this quesstion) your space gets lost as padding. This is just one of the caveats of using CHAR(). It's a damned if you do, damned if you don't situation.

Instead cast that thing to a VARCHAR() or TEXT as they are nearly always superior types to CHAR().

Noted in the Postgresql documentation:

Tip: There is no performance difference among these three types, apart from increased storage space when using the blank-padded type, and a few extra CPU cycles to check the length when storing into a length-constrained column. While character(n) has performance advantages in some other database systems, there is no such advantage in PostgreSQL; in fact character(n) is usually the slowest of the three because of its additional storage costs and slower sorting. In most situations text or character varying should be used instead.

Upvotes: 4

Related Questions