minmer
minmer

Reputation: 33

How to cast only the part of a table using a single SQL command in PostgreSQL

In a PostgreSQL table I have several information stored as text. It depends on the context described by a type column what type of information is stored. The application is prepared to get by only one command the Id's of the row.

I got into trouble when i tried to compare the information (bigint stored as a string) with an external value (e.g. '9' > '11'). When I tried to cast the column, the datatbase return an error (not all values in the column are castable, e.g. datetime or normal text). Also when I try to cast only the result of a query command, I get a cast error.

I get the table with the castable rows by this command:

SELECT information.id as id, item.information::bigint as item
FROM information
INNER JOIN item
ON information.id = item.informationid
WHERE information.type = 'task'

The resulting rows are showing up only text that is castable. When I throw it into another command it results in an error.

SELECT x.id FROM (
    SELECT information.id as id, item.information::bigint as item
    FROM information
    INNER JOIN item
    ON information.id = item.informationid
    WHERE information.type = 'task'
) AS x
WHERE x.item > '0'::bigint

Accroding to the error, the database tried to cast all rows in the table.

Upvotes: 1

Views: 848

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1270463

This is unfortunate. Try using a case expression:

SELECT inf.id as id,
       (CASE WHEN inf.type = 'task' THEN i.information::bigint END) as item
FROM information inf JOIN
     item i
     ON inf.id = i.informationid
WHERE inf.type = 'task';

There is no guarantee that the WHERE filter is applied before the SELECT. However, CASE does guarantee the order of evaluation, so it is safe.

Upvotes: 1

COY
COY

Reputation: 704

Technically, this happens because the optimizer thinks WHERE x.item > '0'::bigint is a much more efficient filter than information.type = 'task'. So in the table scan, the WHERE x.item > '0'::bigint condition is chosen to be the predicate. This thinking is not wrong but will make you fall into this seemingly illogical trouble.

The suggestion by Gordon to use CASE WHEN inf.type = 'task' THEN i.information::bigint END can avoid this, but however it may sometimes ruin your idea to put that as a sub-query and require the same condition to be written twice.

A funny trick I tried is to use OUTER APPLY:

SELECT x.* FROM (SELECT 1 AS dummy) dummy
OUTER APPLY (
    SELECT information.id as id, item.information::bigint AS item
    FROM information
    INNER JOIN item
    ON information.id = item.informationid
    WHERE information.type = 'task'
) x
WHERE x.item > '0'::bigint

Sorry that I only verified the SQL Server version of this. I understand PostgreSQL has no OUTER APPLY, but the equivalent should be:

SELECT x.* FROM (SELECT 1 AS dummy) dummy
LEFT JOIN LATERAL (
    SELECT information.id as id, item.information::bigint AS item
    FROM information
    INNER JOIN item
    ON information.id = item.informationid
    WHERE information.type = 'task'
) x ON true
WHERE x.item > '0'::bigint

(reference is this question)

Finally, a more tidy but less flexible method is add the optimizer hint to turn off it to force the optimizer to run the query as how it is written.

Upvotes: 2

Related Questions