Reputation: 69
I use user_id only in a few weeks (before that I had only user_pseudo_id) ", so I want to update user_id values where it is NULL in the early part of the dataset.
I found a solution here, but it does not fit because I have more user_pseudo_id for each user_id
update multiple rows which is having null values
My code:
UPDATE `dataset.events`
SET user_id = b.user_id
FROM `dataset.events` a
INNER JOIN (SELECT DISTINCT user_pseudo_id, user_id
FROM `dataset.events`
WHERE user_id IS NOT NULL) b
ON a.user_pseudo_id = b.user_pseudo_id
WHERE a.user_id IS NULL
The code is valid but it modified 0 rows and got this popup message: "UPDATE/MERGE must match at most one source row for each target row"
UPDATE: My dateset now:
user_pseudo_id____user_id
a___________________NULL
a___________________NULL
b___________________NULL
c___________________NULL
a___________________111
b___________________111
c___________________222
What I want:
user_pseudo_id____user_id
a___________________111
a___________________111
b___________________111
c___________________222
a___________________111
b___________________111
c___________________222
Note, that users with a and b pseudo_ids are the same users, so they have one user_id.
Upvotes: 3
Views: 11725
Reputation: 172944
Below is for BigQuery Standard SQL
#standardSQL
WITH map AS (
SELECT user_pseudo_id, MIN(user_id) user_id
FROM `project.dataset.table`
WHERE NOT user_id IS NULL
GROUP BY user_pseudo_id
)
SELECT user_pseudo_id, IFNULL(t.user_ID, m.user_id)
FROM `project.dataset.table` t
LEFT JOIN map m
USING(user_pseudo_id)
You can test, play with above using sample data from your question as in example below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'a' user_pseudo_id, NULL user_id UNION ALL
SELECT 'a', NULL UNION ALL
SELECT 'b', NULL UNION ALL
SELECT 'c', NULL UNION ALL
SELECT 'a', '111' UNION ALL
SELECT 'b', '111' UNION ALL
SELECT 'c', '222'
), map AS (
SELECT user_pseudo_id, MIN(user_id) user_id
FROM `project.dataset.table`
WHERE NOT user_id IS NULL
GROUP BY user_pseudo_id
)
SELECT user_pseudo_id, IFNULL(t.user_ID, m.user_id) user_id
FROM `project.dataset.table` t
LEFT JOIN map m
USING(user_pseudo_id)
with result
Row user_pseudo_id user_id
1 a 111
2 a 111
3 b 111
4 c 222
5 a 111
6 b 111
7 c 222
Finally you can wrap above into UPDATE syntax as in below example
#standardSQL
UPDATE `project.dataset.table` t
SET user_id = IFNULL(t.user_ID, map.user_id)
FROM (
SELECT user_pseudo_id, MIN(user_id) user_id
FROM `project.dataset.table`
WHERE NOT user_id IS NULL
GROUP BY user_pseudo_id
) map
WHERE t.user_pseudo_id = map.user_pseudo_id
Or you can just filter only rows to update where user_id is null as in below example
#standardSQL
UPDATE `project.dataset.table` t
SET user_id = map.user_id
FROM (
SELECT user_pseudo_id, MIN(user_id) user_id
FROM `project.dataset.table`
WHERE NOT user_id IS NULL
GROUP BY user_pseudo_id
) map
WHERE t.user_pseudo_id = map.user_pseudo_id
AND t.user_ID IS NULL
Upvotes: 2