Reputation: 106
I want to save a history of user statuses.
For this I have a table with two colums: user_identifier and status.
user_identifier is a string, status is a repeated record with key:value pairs: date and status.
When a user changes a status (for example from active to inactive), I would like to update this table and add the new status, while keeping the old one.
This is the table schema:
[
{
"description": "user identifier",
"mode": "REQUIRED",
"name": "user_id",
"type": "STRING"
},
{
"description": "status - can be either sent or pending, initial state is pending",
"mode": "REPEATED",
"name": "status",
"type": "RECORD",
"fields": [
{
"name": "status_date",
"type": "DATE",
"mode": "REQUIRED"
},
{
"name": "value",
"type": "STRING",
"mode": "REQUIRED"
}
]
}
]
Is this even possible to insert a new user status in this schema? Should I redesign the schema? How to do it properly in BigQuery, to utilize its nesting abilities?
Upvotes: 3
Views: 3331
Reputation: 173028
Below is for BigQuery Standard SQL and assumes you have a statuses table project.dataset.statuses
as you described in your question and you have updates table project.dataset.updates
where you accumulate updates for periodical updates of statuses table
So dummy data could look like
WITH `project.dataset.statuses` AS (
SELECT 'a' user_id, [STRUCT<status_date DATE, value STRING>('2018-11-03', 'pending')] status UNION ALL
SELECT 'b', [STRUCT<status_date DATE, value STRING>('2018-11-04', 'pending')] UNION ALL
SELECT 'c', []
), `project.dataset.updates` AS (
SELECT 'a' user_id, [STRUCT<status_date DATE, value STRING>('2018-11-05', 'sent')] new_statuses UNION ALL
SELECT 'c', [STRUCT<status_date DATE, value STRING>('2018-11-05', 'pending')]
)
where updates table has exact same schema and represents new updates that needs to be added to main table
Below SELECT returns joined statuses
#standardSQL
SELECT
t.user_id,
IF(u.user_id IS NULL, status, ARRAY_CONCAT(status, new_statuses)) status
FROM `project.dataset.statuses` t
LEFT JOIN `project.dataset.updates` u
ON t.user_id = u.user_id
You can use below DDL to "update" statuses table with them
#standardSQL
CREATE OR REPLACE TABLE `project.dataset.statuses` AS
SELECT
t.user_id,
IF(u.user_id IS NULL, status, ARRAY_CONCAT(status, new_statuses)) status
FROM `project.dataset.statuses` t
LEFT JOIN `project.dataset.updates` u
ON t.user_id = u.user_id
If to apply to dummy data
statuses:
Row user_id status.status_date status.value
1 a 2018-11-03 pending
2 b 2018-11-04 pending
3 c
updates:
Row user_id new_statuses.status_date new_statuses.value
1 a 2018-11-05 sent
3 c 2018-11-05 pending
result will be as
Row user_id status.status_date status.value
1 a 2018-11-03 pending
2018-11-05 sent
2 b 2018-11-04 pending
3 c 2018-11-05 pending
In case if updates
table can consists of new users that are not yet in main table - below will handle this scenario
#standardSQL
-- CREATE OR REPLACE TABLE `project.dataset.statuses` AS
SELECT
IFNULL(t.user_id, u.user_id) user_id,
CASE
WHEN t.user_id = u.user_id THEN ARRAY_CONCAT(status, new_statuses)
WHEN t.user_id IS NULL THEN new_statuses
WHEN u.user_id IS NULL THEN status
END status
FROM `project.dataset.statuses` t
FULL JOIN `project.dataset.updates` u
ON t.user_id = u.user_id
Upvotes: 2