D.Aquar
D.Aquar

Reputation: 33

How to update the row based on a matched row from the data grouped by in Sparksql

From the table below, update the var_1 and var_2:

  1. -- Group by id,nr

    -- If prod for (type = 'sold' or page = 'bag') = prod for (type = 'gift')

    --- update var_1 and var_2 from the max(pg_nr) of the matching row

My_Table

    +---+---+-----+----+----+----+----+-----+------+
    | id| nr|pg_nr|type|list|page|prod|var_1| var_2|
    +---+---+-----+----+----+----+----+-----+------+
    | v1|  1|    1|load| 250| bag|SE92| ball|  blue|
    | v1|  1|    2|load| 250| bag|SFW2| null|  null|
    | v1|  1|    3|view| 250| prd|PUF2| null|  null|
    | v1|  1|    4|gift| 203| prd|SE92| ball|orange|
    | v1|  1|    4|gift| 203| prd|MMA3| ball|   red|
    | v1|  1|    5|view| 250| bag|MNY3| null|  null|
    | v1|  1|    6|sold| 250| prd|SE92| null|  null|
    | v1|  1|    7|gift| 203| prd|MM03| ball|orange|
    | v1|  1|    7|gift| 203| prd|MMX3| ball|   red|
    | v1|  1|    8|view| 250| bag|MNY3| null|  null|
    | v2|  1|    1|load| 250| bag|SE92| null|  null|
    +---+---+-----+----+----+----+----+-----+------+

Output

+---+---+-----+----+----+----+----+-----+------+
| id| nr|pg_nr|type|list|page|prod|var_1| var_2|
+---+---+-----+----+----+----+----+-----+------+
| v1|  1|    1|load| 250| bag|SE92| ball|  blue|
| v1|  1|    2|load| 250| bag|SFW2| null|  null|
| v1|  1|    3|view| 250| prd|PUF2| null|  null|
| v1|  1|    4|gift| 203| prd|SE92| ball|orange|
| v1|  1|    4|gift| 203| prd|MMA3| ball|   red|
| v1|  1|    5|view| 250| bag|MNY3| null|  null|
| v1|  1|    6|sold| 250| prd|SE92| ball|orange| <--
| v1|  1|    7|gift| 203| prd|MM03| ball|orange| 
| v1|  1|    7|gift| 203| prd|MMX3| ball|   red|
| v1|  1|    8|view| 250| bag|MNY3| null|  null|
| v2|  1|    1|load| 250| bag|SE92| null|  null|
+---+---+-----+----+----+----+----+-----+------+

Upvotes: 0

Views: 31

Answers (1)

Venkat
Venkat

Reputation: 571

From the below query update the var_1 and var_2 columns in the My_Table table. Then joined with the main table using matching criteria. The rows in My_Table are updated using the SET clause.

UPDATE My_Table t1
SET var_1 = t2.var_1, var_2 = t2.var_2
FROM (
    SELECT id, nr, MAX(pg_nr) AS max_pg_nr, prod AS max_prod, var_1 AS max_var_1, var_2 AS max_var_2
    FROM My_Table
    WHERE (type = 'sold' OR page = 'bag')
    GROUP BY id, nr
    HAVING prod = (
        SELECT prod
        FROM My_Table
        WHERE type = 'gift'
    )
) t2
WHERE t1.id = t2.id AND t1.nr = t2.nr AND t1.pg_nr = t2.max_pg_nr;

Upvotes: 0

Related Questions