Reputation: 1
I have a database in Firestore, send the information to Bigquery with a extension 'Stream Collections to BigQuery'.
But I have the tables with all the data in the same column called 'data'. Also, the data shows all the records/logs, and I just want to see the latest status.
To be able to see the data separated by columns and see the last value, create this query and them save as view.
SELECT
document_name,
document_id,
timestamp,
event_id,
operation,
created_at,
status_old,
status_new
FROM
(
SELECT
document_name,
document_id,
FIRST_VALUE(timestamp) OVER(
PARTITION BY document_name
ORDER BY
timestamp DESC
) AS timestamp,
FIRST_VALUE(event_id) OVER(
PARTITION BY document_name
ORDER BY
timestamp DESC
) AS event_id,
FIRST_VALUE(operation) OVER(
PARTITION BY document_name
ORDER BY
timestamp DESC
) AS operation,
FIRST_VALUE(`proyect_name.DatasetID.firestoreTimestamp`(JSON_EXTRACT(data, '$.timestamp'))) OVER(
PARTITION BY document_name
ORDER BY
timestamp DESC
) AS created_at,
FIRST_VALUE(JSON_EXTRACT_SCALAR(data, '$.metadata.old')) OVER(
PARTITION BY document_name
ORDER BY
timestamp DESC
) AS status_old,
FIRST_VALUE(JSON_EXTRACT_SCALAR(data, '$.metadata.new')) OVER(
PARTITION BY document_name
ORDER BY
timestamp DESC
) AS status_new
FROM
table1
ORDER BY
document_name,
timestamp DESC
)
GROUP BY
document_name,
document_id,
timestamp,
event_id,
operation,
created_at,
status_old,
status_new
(I have several fields to add in addition to these, but it is just an example).
For all my querys, I use to use this 'view'.
The problem I have is that this query /view, takes a long time and cost:
Is there a way to make this more optimal?
Upvotes: 0
Views: 1145
Reputation: 12264
Hope these are helpful:
Remove ORDER BY
in your subquery first. It will slow down your query and doesn't make any difference to the result.
It seems all the FIRST_VALUE
come from same row. If so, you can simplify your query like this.
SELECT document_name,
document_id,
first_values.*
FROM (
SELECT document_name,
document_id,
FIRST_VALUE(STRUCT(
timestamp,
event_id,
operation,
`proyect_name.DatasetID.firestoreTimestamp`(JSON_EXTRACT(data, '$.timestamp')) AS created_at,
JSON_EXTRACT_SCALAR(data, '$.metadata.old') AS status_old,
JSON_EXTRACT_SCALAR(data, '$.metadata.new') AS status_new
) OVER (PARTITION BY document_name ORDER BY timestamp DESC) AS first_values
FROM table1
)
GROUP BY
in your outer(main) query.
But I don't think your inner query make duplicates. So, check if your subquery makes duplicates first and if not, remove GROUP BY
from the outer query.Upvotes: 0