Reputation: 111
i want to get last id
and their rank
(based on order by
date_update
asc
and then order by
again by id desc
) and show id
and rank
of id. i do the query like below:
SELECT id as data,
RANK() OVER (ORDER BY date_update) AS rank
FROM `test.sample`
ORDER BY id DESC
LIMIT 1
and it's work for other table but didn't work some table with large data and get notice:
Resources exceeded during query execution: The query could not be executed in the allotted memory.
i have done read Troubleshooting Error Big Query
and try to remove ORDER BY
but still can't running, what should i do ?
sample data:
id date_update
22 2019-10-04
14 2019-10-01
24 2019-10-03
13 2019-10-02
process :
Rank() Over (Order by date_update)
id date_update rank
14 2019-10-01 1
13 2019-10-02 2
24 2019-10-03 3
22 2019-10-04 4
order by id desc
based on above
id date_update rank
24 2019-10-03 3
22 2019-10-04 4
14 2019-10-01 1
13 2019-10-02 2
this is the expected result:
id rank
24 3
Upvotes: 1
Views: 479
Reputation: 2365
You can use the query below. It basically finds the row with max ID (latest ID), then queries the source table again using date_value of max id row as a filter.
WITH
`test.sample` AS
(
select 22 AS id, DATE('2019-10-04') as date_update union all
select 14 AS id, DATE('2019-10-01') as date_update union all
select 24 AS id, DATE('2019-10-03') as date_update union all
select 13 AS id, DATE('2019-10-02') as date_update
),
max_id_row AS
(
SELECT ARRAY_AGG(STRUCT(id, date_update) ORDER BY id DESC LIMIT 1)[OFFSET(0)] vals
FROM `test.sample`
)
SELECT m.vals.id, m.vals.date_update, COUNT(*) as rank
FROM `test.sample` as t
JOIN max_id_row as m
ON t.date_update <= m.vals.date_update
GROUP BY 1,2
Upvotes: 3
Reputation: 172993
Below is for BigQuery Standard SQL and should scale to whatever "large" data you have
#standardSQL
SELECT b.id, COUNT(1) + 1 AS `rank`
FROM `project.dataset.table` a
JOIN (
SELECT ARRAY_AGG(STRUCT(id, date_update) ORDER BY id DESC LIMIT 1)[OFFSET(0)].*
FROM `project.dataset.table`
) b
ON a.date_update < b.date_update
GROUP BY id
If to apply for sample data in your question -
WITH `project.dataset.table` AS (
SELECT 22 id, DATE '2019-10-04' date_update UNION ALL
SELECT 14, '2019-10-01' UNION ALL
SELECT 24, '2019-10-03' UNION ALL
SELECT 13, '2019-10-02'
)
result is
Row id rank
1 24 3
The "trick" here is in changing focus from not scalable code with non or badly parallelized operations (RANK) to something that is as simple as COUNT'ing
So, your case (at least as it is presented in question's "process" section) can be rephrased as finding number of rows before the day with highest id - that simple - thus above simple query. Obviously adding "1" to that count gives you exactly what would RANK gave you if worked
Upvotes: 1