Google Bigquery Memory error when using ROW_NUMBER() on large table - ways to replace long hash by short unique identifier

Question

For a query in google BigQuery I want to replace a long hash by a shorter numeric unique identifier to save some memory afterwards, so I do:

SELECT 
    my_hash
    , ROW_NUMBER() OVER (ORDER BY null) AS id_numeric
FROM hash_table_raw
GROUP BY my_hash

I don't even need an order in the id, but ROW_NUMBER() requires an ORDER BY.

When I try this on my dataset (> 1 billion rows) I get a memory error:

400 Resources exceeded during query execution: The query could not be executed in the allotted memory. Peak usage: 126% of limit.
Top memory consumer(s):
  sort operations used for analytic OVER() clauses: 99%
  other/unattributed: 1%

Is there another way to replace a hash by an shorter identifier?

Thanks!

Google Bigquery Memory error when using ROW_NUMBER() on large table - ways to replace long hash by short unique identifier

Answers (1)

Related Questions