Damian
Damian

Reputation: 461

Update table from results of querying 2 other tables

I have the following problem: There is table LIST with 5 columns:

  1. List item
  2. product_id
  3. refNo
  4. part_number
  5. quantity
  6. state

There are around 100 000 rows in this table. There is also table part and history and they all have part_number column.

Now I need to update LIST.state column with value 1, 2 or 3 where LIST.refNo = some value, depending on the result of the following situation:

If LIST.part_number does not exist in table part.part_number
    set LIST.state to 1.
If it does, if this part_number and corresponding date in history is less than a week,
    set LIST.State to 2. Else set it to 3.

I am using postgreSQL. Any help would be greatly appreciated. So far, I know how to use this simple IF ELSE statement:

DO
$BODY$
BEGIN
    IF EXISTS (SELECT part_number FROM part WHERE refNo=7000) THEN

    ELSE 
        UPDATE LIST set state = 1
    END IF;
END;
$BODY$

I think its not the best approach because now I have to enclose this if with loop and since there are over 200 000 rows in history table, comparing one by one will be super slow.

Maybe I can first select values that exist in part table and do queries on them to get values from history?

Upvotes: 1

Views: 344

Answers (2)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 657002

No plpgsql needed here.

For lots of rows at once:

UPDATE list l
SET    state = u.state
FROM (
   SELECT li.pk_col
        , CASE WHEN h.hist_date > (now() - interval '1 week') THEN 2
               WHEN p.part_number IS NULL THEN 1
               ELSE 2
          END AS state
   FROM   list         li
   LEFT   JOIN part    p USING (part_number)
   LEFT   JOIN history h USING (part_number)
   WHERE  li.refNo = <some value>
   ) u
WHERE l.pk_col = u.pk_col   -- insert actual pk column
AND   l.state IS DISTINCT FROM u.state;

Or, faster, but a bit more verbose without subquery:

UPDATE list l
SET    state = CASE WHEN h.hist_date > (now() - interval '1 week') THEN 2
                    WHEN p.part_number IS NULL THEN 1
                    ELSE 2 END
FROM   list         li
LEFT   JOIN part    p USING (part_number)
LEFT   JOIN history h USING (part_number)
WHERE  li.refNo = <some value>
AND    l.pk_col = li.pk_col   -- insert actual pk column
AND    l.state IS DISTINCT FROM 
       CASE WHEN h.hist_date > (now() - interval '1 week') THEN 2
            WHEN p.part_number IS NULL THEN 1
            ELSE 2 END

I am assuming the column history.hist_date, since you did not specify.

Replace pk_col with your actual primary key column(s). You did not specify.

You shouldn't be using unquoted mixed case identifiers like refNo. Either use refno or "refNo".

The last WHERE condition avoids empty updates where state already had the intended value. Omit if that can't happen in your case. Most of the time it's an improvement to performance people tend to forget.

For this to be fast you need indexes on the involved columns. PK columns are indexed automatically. You also need one for every part_number, for list.refNo and for history.hist_date.
Ideally you'd have a multicolumn index on history like:

CREATE INDEX h_multi_idx ON history (part_number, hist_date DESC);

Would trump one index for each column in this case.

Simpler version for few rows

Another way is a correlated subquery - simpler for updating a single or few row:

UPDATE list l
SET    state = COALESCE((
   SELECT CASE WHEN h.hist_date > (now() - interval '1 week') THEN 2 ELSE 3 END
   FROM   part p
   LEFT   JOIN history h USING (part_number)
   WHERE  p.part_number = l.part_number
   ), 1)
WHERE  refNo = <some value>;

For lots of rows at once, correlated subqueries do not scale well.
Note that this updates the row even if state already had the intended value .

Update for comment

For duplicates on (part_number) in history and (assuming) no duplicates on (part_number) in part or list, here is one (of many possible) solutions:

UPDATE list l
SET    state = u.state
FROM (
   SELECT DISTINCT ON (h.part_number)
          li.pk_col
        , CASE WHEN h.hist_date > (now() - interval '1 week') THEN 2
               WHEN p.part_number IS NULL THEN 1
               ELSE 2
          END AS state
   FROM   list         li
   LEFT   JOIN part    p USING (part_number)
   LEFT   JOIN history h USING (part_number)
   WHERE  li.refNo = <some value>
   ORDER  BY h.part_number, h.hist_date DESC NULLS LAST
   ) u
WHERE l.pk_col = u.pk_col   -- insert actual pk column
AND   l.state IS DISTINCT FROM u.state;

There may be more efficient ways. Depends on missing information.

Upvotes: 2

Java Devil
Java Devil

Reputation: 10959

You could do this with three separate queries (I'm sure there is a way to do it in one, like Erwins answer) which updates all values based on the ref no at once.

Like this:

--First
UPDATE LIST 
SET STATE = 1 
WHERE NOT PART_NUM 
IN (SELECT PART_NUM 
    FROM PART) 
AND REF_no = 1000;
--Second
UPDATE LIST
SET STATE = 2
WHERE PART_NUM IN
         (SELECT PART_NUM 
          FROM HISTORY 
          WHERE age(TIME, CURRENT_TIMESTAMP) > (INTERVAL '-7 days'))
AND REF_NO = 1000;
--Third
UPDATE LIST
SET STATE = 3
WHERE STATE NOT IN(1,2)
AND REF_NO = 1000

This is not very elegant but should get it done, cant be certain if it would be the fastest way either.

Here's the SQLFiddle

Upvotes: 1

Related Questions