Akos
Akos

Reputation: 69

How to update multiple rows in BigQuery

I use user_id only in a few weeks (before that I had only user_pseudo_id) ", so I want to update user_id values where it is NULL in the early part of the dataset.

I found a solution here, but it does not fit because I have more user_pseudo_id for each user_id

update multiple rows which is having null values

My code:

UPDATE `dataset.events`
    SET user_id = b.user_id
    FROM `dataset.events` a
        INNER JOIN (SELECT DISTINCT user_pseudo_id, user_id 
                    FROM `dataset.events`
                    WHERE user_id IS NOT NULL) b
            ON a.user_pseudo_id = b.user_pseudo_id
     WHERE a.user_id IS NULL

The code is valid but it modified 0 rows and got this popup message: "UPDATE/MERGE must match at most one source row for each target row"

UPDATE: My dateset now:

user_pseudo_id____user_id

a___________________NULL
a___________________NULL
b___________________NULL
c___________________NULL
a___________________111
b___________________111
c___________________222

What I want:

user_pseudo_id____user_id

a___________________111
a___________________111
b___________________111
c___________________222
a___________________111
b___________________111
c___________________222

Note, that users with a and b pseudo_ids are the same users, so they have one user_id.

Upvotes: 3

Views: 11725

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172944

Below is for BigQuery Standard SQL

#standardSQL
WITH map AS (
  SELECT user_pseudo_id, MIN(user_id) user_id
  FROM `project.dataset.table`
  WHERE NOT user_id IS NULL
  GROUP BY user_pseudo_id
)
SELECT user_pseudo_id, IFNULL(t.user_ID, m.user_id)
FROM `project.dataset.table` t
LEFT JOIN map m
USING(user_pseudo_id)   

You can test, play with above using sample data from your question as in example below

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'a' user_pseudo_id, NULL user_id UNION ALL
  SELECT 'a', NULL UNION ALL
  SELECT 'b', NULL UNION ALL
  SELECT 'c', NULL UNION ALL
  SELECT 'a', '111' UNION ALL
  SELECT 'b', '111' UNION ALL
  SELECT 'c', '222' 
), map AS (
  SELECT user_pseudo_id, MIN(user_id) user_id
  FROM `project.dataset.table`
  WHERE NOT user_id IS NULL
  GROUP BY user_pseudo_id
)
SELECT user_pseudo_id, IFNULL(t.user_ID, m.user_id) user_id
FROM `project.dataset.table` t
LEFT JOIN map m
USING(user_pseudo_id)   

with result

Row user_pseudo_id  user_id  
1   a               111  
2   a               111  
3   b               111  
4   c               222  
5   a               111  
6   b               111  
7   c               222    

Finally you can wrap above into UPDATE syntax as in below example

#standardSQL
UPDATE `project.dataset.table` t
SET user_id = IFNULL(t.user_ID, map.user_id)
FROM (
  SELECT user_pseudo_id, MIN(user_id) user_id
  FROM `project.dataset.table`
  WHERE NOT user_id IS NULL
  GROUP BY user_pseudo_id
) map
WHERE t.user_pseudo_id = map.user_pseudo_id

Or you can just filter only rows to update where user_id is null as in below example

#standardSQL
UPDATE `project.dataset.table` t
SET user_id = map.user_id
FROM (
  SELECT user_pseudo_id, MIN(user_id) user_id
  FROM `project.dataset.table`
  WHERE NOT user_id IS NULL
  GROUP BY user_pseudo_id
) map
WHERE t.user_pseudo_id = map.user_pseudo_id
AND t.user_ID IS NULL

Upvotes: 2

Related Questions