Nagesh Singh Chauhan
Nagesh Singh Chauhan

Reputation: 784

Implement ROW_NUMBER() in beamSQL

I have the below query :

SELECT DISTINCT Summed, ROW_NUMBER () OVER (order by Summed desc) as Rank  from table1

I have to write it in Apache Beam(beamSql). Below is my code :

PCollection<BeamRecord> rec_2_part2 = rec_2.apply(BeamSql.query("SELECT DISTINCT Summed, ROW_NUMBER(Summed) OVER (ORDER BY Summed) Rank1 from PCOLLECTION "));

But I'm getting the below error :

Caused by: java.lang.UnsupportedOperationException: Operator: ROW_NUMBER is not supported yet!

Any idea how to implement ROW_NUMBER() in beamSql ?

Upvotes: 1

Views: 568

Answers (2)

Nagesh Singh Chauhan
Nagesh Singh Chauhan

Reputation: 784

The solution which worked for the above query:

PCollection<BeamRecord> rec_2 = rec_1.apply(BeamSql.query("SELECT max(Summed) as maxed, max(Summed)-10 as least, 'a' as Dummy from PCOLLECTION"));

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521674

Here is one way you can approximate your current query without using ROW_NUMBER:

SELECT
    t1.Summed,
    (SELECT COUNT(*) FROM (SELECT DISTINCT Summed FROM table1) t2
     WHERE t2.Summed >= t1.Summed) AS Rank
FROM
(
    SELECT DISTINCT Summed
    FROM table1
) t1

The basic idea is to first subquery to get a table with only distinct Summed values. Then, use a correlated subquery to simulate the row number. This isn't a very efficient method, but if ROW_NUMBER is not available, then you're stuck with some alternative.

Upvotes: 1

Related Questions