d_a_c321
d_a_c321

Reputation: 533

In postgresql, how can I fill in missing values within a column?

I'm trying to figure out how to fill in values that are missing from one column with the non-missing values from other rows that have the same value on a given column. For instance, in the below example, I'd want all the "1" values to be equal to Bob and all of the "2" values to be equal to John

ID #   | Name
-------|-----
1      | Bob 
1      | (null)
1      | (null)
2      | John
2      | (null)
2      | (null)
`

EDIT: One caveat is that I'm using postgresql 8.4 with Greenplum and so correlated subqueries are not supported.

Upvotes: 2

Views: 8851

Answers (3)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 658887

UPDATE tbl
SET    name = x.name
FROM  (
    SELECT DISTINCT ON (id) id, name
    FROM   tbl
    WHERE  name IS NOT NULL
    ORDER  BY id, name
    ) x
WHERE  x.id = tbl.id
AND    tbl.name IS NULL;

DISTINCT ON does the job alone. Not need for additional aggregation.

In case of multiple values for name, the alphabetically first one (according to the current locale) is picked - that's what the ORDER BY id, name is for. If name is unambiguous you can omit that line.

Also, if there is at least one non-null value per id, you can omit WHERE name IS NOT NULL.

Upvotes: 1

wildplasser
wildplasser

Reputation: 44250

CREATE TABLE bobjohn
        ( ID INTEGER NOT NULL
        , zname varchar
        );
INSERT INTO bobjohn(id, zname) VALUES
 (1,'Bob') ,(1, NULL) ,(1, NULL)
,(2,'John') ,(2, NULL) ,(2, NULL)
        ;

UPDATE bobjohn dst
SET zname = src.zname
FROM bobjohn src
WHERE dst.id = src.id
AND dst.zname IS NULL
AND src.zname IS NOT NULL
        ;

SELECT * FROM bobjohn;

NOTE: this query will fail if more than one name exists for a given Id. (and it won't touch records for which no non-null name exists)

If you are on a postgres version >-9, you could use a CTE to fetch the source tuples (this is equivalent to a subquery, but is easier to write and read (IMHO). The CTE also tackles the duplicate values-problem (in a rather crude way):

        --
        -- CTE's dont work in update queries for Postgres version below 9
        --
WITH uniq AS (
        SELECT DISTINCT id
        -- if there are more than one names for a given Id: pick the lowest
        , min(zname) as zname
        FROM bobjohn
        WHERE zname IS NOT NULL
        GROUP BY id
        )
UPDATE bobjohn dst
SET zname = src.zname
FROM uniq src
WHERE dst.id = src.id
AND dst.zname IS NULL
        ;

SELECT * FROM bobjohn;

Upvotes: 3

cdhowie
cdhowie

Reputation: 169403

If you know for a fact that there are no conflicting values (multiple rows with the same ID but different, non-null names) then something like this will update the table appropriately:

UPDATE some_table AS t1
SET name = (
    SELECT name
    FROM some_table AS t2
    WHERE t1.id = t2.id
      AND name IS NOT NULL
    LIMIT 1
)
WHERE name IS NULL;

If you only want to query the table and have this information filled in on the fly, you can use a similar query:

SELECT
    t1.id,
    (
        SELECT name
        FROM some_table AS t2
        WHERE t1.id = t2.id
          AND name IS NOT NULL
        LIMIT 1
    ) AS name

FROM some_table AS t1;

Upvotes: 0

Related Questions