hinafaya
hinafaya

Reputation: 111

Do partial row in BigQuery to get last data and order by id

i want to get last id and their rank (based on order by date_update asc and then order by again by id desc ) and show id and rank of id. i do the query like below:

 SELECT id as data,
                    RANK() OVER (ORDER BY date_update) AS rank
                    FROM `test.sample`
                    ORDER BY id DESC
                    LIMIT 1 

and it's work for other table but didn't work some table with large data and get notice:

Resources exceeded during query execution: The query could not be executed in the allotted memory.

i have done read Troubleshooting Error Big Query and try to remove ORDER BY but still can't running, what should i do ?

sample data:

id     date_update
22     2019-10-04
14     2019-10-01
24     2019-10-03
13     2019-10-02

process :

  1. Rank() Over (Order by date_update)

id date_update rank

14   2019-10-01    1
13   2019-10-02    2
24   2019-10-03    3
22   2019-10-04    4
  1. order by id desc based on above

    id    date_update  rank                                            
    24   2019-10-03   3
    22   2019-10-04   4
    14   2019-10-01   1
    13   2019-10-02   2
    

this is the expected result:

id        rank                                                         
24         3

Upvotes: 1

Views: 479

Answers (2)

Sabri Karagönen
Sabri Karagönen

Reputation: 2365

You can use the query below. It basically finds the row with max ID (latest ID), then queries the source table again using date_value of max id row as a filter.

WITH 
`test.sample` AS
(
  select 22 AS id, DATE('2019-10-04') as date_update union all
  select 14 AS id, DATE('2019-10-01') as date_update union all
  select 24 AS id, DATE('2019-10-03') as date_update union all
  select 13 AS id, DATE('2019-10-02') as date_update 
),
max_id_row AS
(
    SELECT ARRAY_AGG(STRUCT(id, date_update) ORDER BY id DESC LIMIT 1)[OFFSET(0)] vals
    FROM `test.sample`
)
SELECT m.vals.id, m.vals.date_update, COUNT(*) as rank
FROM `test.sample` as t
JOIN max_id_row as m
    ON t.date_update <= m.vals.date_update
GROUP BY 1,2

Upvotes: 3

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

Below is for BigQuery Standard SQL and should scale to whatever "large" data you have

#standardSQL
SELECT b.id, COUNT(1) + 1 AS `rank`
FROM `project.dataset.table` a
JOIN (
  SELECT ARRAY_AGG(STRUCT(id, date_update) ORDER BY id DESC LIMIT 1)[OFFSET(0)].* 
  FROM `project.dataset.table`
) b
ON a.date_update < b.date_update
GROUP BY id   

If to apply for sample data in your question -

WITH `project.dataset.table` AS (
  SELECT 22 id, DATE '2019-10-04' date_update UNION ALL
  SELECT 14, '2019-10-01' UNION ALL
  SELECT 24, '2019-10-03' UNION ALL
  SELECT 13, '2019-10-02' 
)

result is

Row id  rank     
1   24  3     

The "trick" here is in changing focus from not scalable code with non or badly parallelized operations (RANK) to something that is as simple as COUNT'ing

So, your case (at least as it is presented in question's "process" section) can be rephrased as finding number of rows before the day with highest id - that simple - thus above simple query. Obviously adding "1" to that count gives you exactly what would RANK gave you if worked

Upvotes: 1

Related Questions