bigDataArtist
bigDataArtist

Reputation: 181

How to process Row number window function on incremental data

I have a table which as row number window function running for some IDs. Now every time a new data comes its a full load and the new row numbers are assigned to them again. So Row Num runs on the entire data set again , which is quite ineffeciet as lot of resources get consumed and it makes it CPU intensive. This table is built every 15 to 30 mins. I am trying to achieve the same thing but using incremental and then add the result of the incremental to the last row_count of a particular customer_ID

So when new record comes , I want to save the max row_num for that particular record lets say max_row_num = 4 , now two new record comes for a ID , so row_num for incremental is 1,2. Final output should be 4+1 and 4+2 something. so the new row number looks like 1,2,3,4,5,6 adding 1 and 2 to the max of the previous Row_num.

I want to implement the logic in my Pyspark actually! But I am open to python solution and then may be convert to pyspark DataFrame later.

Please help and suggest the possible solutions

Full load -- intial table

Row_num customer_ID
1 ABC123
2 ABC123
3 ABC123
1 ABC125
2 ABC125
1 ABC225
2 ABC225
3 ABC225
4 ABC225
5 ABC225

incremental load

Row_num customer_ID
1 ABC123
2 ABC123
1 ABC125
1 ABC225
2 ABC225
1 ABC330

DESIRED OUPUT

Row_num customer_ID
1 ABC123
2 ABC123
3 ABC123
4 ABC123
1 ABC125
2 ABC125
3 ABC125
1 ABC225
2 ABC225
3 ABC225
4 ABC225
5 ABC225
6 ABC225
1 ABC330

Upvotes: 2

Views: 678

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269973

If you are trying to insert the values with the new row number, you can join in the maximum existing row number:

insert into full (row_num, customer_id)
    select i.row_number + coalesce(f.max_row_number, 0), i.customer_id
    from incremental i left join
         (select f.customer_id, max(row_number) as max_row_number
          from full f
          group by f.customer_id
         ) f
         on i.customer_id = f.customer_id;

Upvotes: 2

Related Questions