Thanos Makris
Thanos Makris

Reputation: 3115

BigQuery: Get latest row in a table only if a field has a specific value

Let's assume that we have a table in BigQuery with the following data:

ColA | ColB | ColC | ColD
  x  |   y  |  1   |   5
  x  |   y  |  0   |   6
  k  |   z  |  1   |   4

ColA, ColB are String, ColC is Integer and ColD timestamp.

I need to get the latest row (LAST(ColD)) for each unique value of ColA only if ColC is equal to 1. That is, in the above table, the result of the query should be:

ColA | ColB | ColC | ColD
  k  |   z  |  1   |   4

Rows 1 and 2 should not be included in the results, since for ColA=x the last entry in the table (Row 2) had ColC=0.

I have tried various queries using aggregate functions (e.g. MAX, LAST) in combination with subqueries but was not able to reach to the desired result.

Upvotes: 2

Views: 4451

Answers (1)

Pentium10
Pentium10

Reputation: 208022

The proper name for the analytic function in SQL Standard is FIRST_VALUE. FIRST is aggregation function in BigQuery

So first you obtain the last ColC for each ColA ordered by the timestamp

SELECT
  ColA,
  ColB,
  ColC,
  time,
  FIRST_VALUE(ColC) OVER (PARTITION BY ColA ORDER BY time DESC) AS last_Col_C
FROM (
  SELECT
    'x' AS ColA,
    'y' AS ColB,
    1 AS ColC,
    5 AS time),
  (
  SELECT
    'x' AS ColA,
    'y' AS ColB,
    0 AS ColC,
    6 AS time),
  (
  SELECT
    'k' AS ColA,
    'z' AS ColB,
    1 AS ColC,
    4 AS time)

Then you further select what interests you, final query looks like:

SELECT
  ColA,
  ColB,
  ColC,
  time
FROM (
--the above query here--)
    where last_Col_C=1

Upvotes: 5

Related Questions