neel.1708
neel.1708

Reputation: 315

ROW_NUMBER() not sequencing records correctly

I had to copy the data from oracle tables to files.

I have a join query which fetches 800k records so i used row_number() function along with order by clause to generate 4 files containing 200k each.

Query :

SELECT * FROM (
    SELECT  ROW_NUMBER() OVER ( order by FILE_KEY desc ) rn,
        FILE_KEY, ROUTING_NO, INTLROUT_TYPE, ABBR_COUNTRY_CODE_2D, HO_CATALOG_NO 
    FROM BANK_INTL_ROUT_TBL rout, BANK_INTL_LOC_TBL loc 
    WHERE loc.CATALOG_NO = rout.FILE_KEY) 
WHERE rn BETWEEN start AND end;

Parameters:

For 1st File  : start =1 ,end = 200000
For 2nd File  : start =200001 ,end = 400000
For 3rd File  : start =400001 ,end = 600000
For 4th File  : start =600001 ,end = 800000

But when i checked last 10 row using this query in sql query browser and last 10 rows of file are different that is sequence is different in file and sql query browser.

SELECT * FROM (
    SELECT  ROW_NUMBER() OVER(  order by FILE_KEY desc ) rn,
        FILE_KEY,ROUTING_NO,INTLROUT_TYPE,ABBR_COUNTRY_CODE_2D,HO_CATALOG_NO 
    FROM BANK_INTL_ROUT_TBL rout, BANK_INTL_LOC_TBL loc 
    WHERE loc.CATALOG_NO=rout.FILE_KEY) 
WHERE rn BETWEEN 709990 AND 80000;

Upvotes: 2

Views: 1434

Answers (3)

A.B.Cade
A.B.Cade

Reputation: 16915

In the over clause, order by a unique field in BANK_INTL_LOC_TBL:

SELECT * FROM (
    SELECT  ROW_NUMBER() OVER ( order by loc.**LOC_KEY** desc ) rn,
        FILE_KEY, ROUTING_NO, INTLROUT_TYPE, ABBR_COUNTRY_CODE_2D, HO_CATALOG_NO 
    FROM BANK_INTL_ROUT_TBL rout, BANK_INTL_LOC_TBL loc 
    WHERE loc.CATALOG_NO = rout.FILE_KEY) 
WHERE rn BETWEEN start AND end
ORDER BY rn;

UPDATE: according to @Shannon Severance comment
add the order by clause

Upvotes: 1

rejj
rejj

Reputation: 1216

If you have disk to spare on your Oracle installation (which you should!), then instead of running the inner query 4 times it may end up being faster to do the following

CREATE TABLE bank_data
NOLOGGING
PARALLEL 4
AS SELECT  ROW_NUMBER() OVER ( order by FILE_KEY desc ) rn,
    FILE_KEY, ROUTING_NO, INTLROUT_TYPE, ABBR_COUNTRY_CODE_2D, HO_CATALOG_NO 
FROM BANK_INTL_ROUT_TBL rout, BANK_INTL_LOC_TBL loc 
WHERE loc.CATALOG_NO = rout.FILE_KEY);

The amount of parallelism to use (the number 4 in my example here) will depend on how much concurrent work your database can handle, mostly dependent on the number of CPUs.

After that has finished, (which should take noticably less than 5 hours!) you can then run simple selects on the bank_dump table to pull the records you desire

SELECT *
FROM bank_dump
where rn < 200000

for your first data set, for example.

Upvotes: 0

Florin Ghita
Florin Ghita

Reputation: 17643

This can be because you have something like this

row_number file_key 
799998     same_number
799999     same_number
800000     same_number
800001     same_number
800002     same_number
800003     same_number
800004     same_number

because you order by file_key.

How do you observed that are different data? from your other columns. So, you can use:

SELECT  ROW_NUMBER() OVER(order by FILE_KEY desc, ROUTING_NO, INTLROUT_TYPE, ABBR_COUNTRY_CODE_2D, HO_CATALOG_NO ) rn

Or(second cause), your base table had been changen between your querys.

UDPDATE: you can use the use_hash hint to speed up your query. 5 hours is too much for this query.

SELECT * FROM (
    SELECT  /*+use_hash(rout loc)*/
        ROW_NUMBER() OVER(order by FILE_KEY desc, ROUTING_NO, INTLROUT_TYPE, ABBR_COUNTRY_CODE_2D, HO_CATALOG_NO ) rn,
        FILE_KEY, ROUTING_NO, INTLROUT_TYPE, ABBR_COUNTRY_CODE_2D, HO_CATALOG_NO 
    FROM BANK_INTL_ROUT_TBL rout, BANK_INTL_LOC_TBL loc 
    WHERE loc.CATALOG_NO = rout.FILE_KEY) 
WHERE rn BETWEEN start AND end;

Upvotes: 1

Related Questions