user26863736
user26863736

Reputation: 21

JOIN two tables but only preserve results with longest substring match (Oracle SQL)

tables

I'm struggling to achieve the following.

I have a table of postal codes that includes full and partial postal codes. It will have other columns also.

In another table, I have locations, all which have full postal codes. My locations table will also have other columns.

What I need to achieve is a JOIN between postal_codes and locations such that one row per location remains, where that row has the values from the postal_codes table for the row that matches the longest string of characters for the postal code.

If I only have 1 location I can achieve this with a double-nested SELECT where the innermost does an ORDER BY the length DESC and the SELECT around that does a WHERE ROWNUM =1, but this approach won't work if I'm trying to match other locations at the same time.

I need an efficient approach as the postal_codes table has over 1 million rows.

I cannot add, remove, or alter the postal_codes table.

Any ideas would be appreciated!

Upvotes: 2

Views: 42

Answers (2)

d r
d r

Reputation: 7876

One of the options to do it is using recursive cte getting out substrings of postal code that could be used as a condition in join's ON clause:

WITH    --  S a m p l e    D a t a :
    postal_codes AS
      ( Select  1 as id, 'A0A' as pc_lookup     From Dual Union All
        Select  2,       'A0A0'                 From Dual Union All
        Select  3,       'A0A0A'                From Dual Union All
        Select  4,       'A0A0A1'               From Dual Union All
        Select  5,       'A0A0A2'               From Dual Union All
        Select  6,       'A0B'                  From Dual Union All
        Select  7,       'A0B0A'                From Dual Union All
        Select  8,       'A0B0A0'               From Dual Union All
        Select  9,       'A0B0A1'               From Dual Union All
        Select 10,       'A0B0A2'               From Dual Union All
        Select 11,       'A0B0A3'               From Dual Union All
        Select 12,       'A0C0'                 From Dual Union All
        Select 13,       'A0C1'                 From Dual Union All
        Select 14,       'A0C1A1'               From Dual Union All
        Select 15,       'B'                    From Dual Union All
        Select 16,       'B2'                   From Dual Union All
        Select 17,       'B3A'                  From Dual 
      ), 
    locations AS 
      ( Select 1 as id, 'A0A0A2' as postal_code From Dual Union All
        Select 2,       'A0A0A4'                From Dual Union All
        Select 3,       'A0A0B1'                From Dual Union All
        Select 4,       'A0B0A3'                From Dual Union All
        Select 5,       'A0C1A2'                From Dual Union All
        Select 6,       'B1B1B1'                From Dual 
      ), 

... recursive cte (grid) generating possible join strings for postal code

    grid (cnt, id, postal_code, lookup_code) AS
      ( Select     1 as cnt, 
                   l.id, l.postal_code, SubStr(l.postal_code, 1, 7 - 1) as lookup_code
        From       locations l
       UNION ALL
        Select     g.cnt + 1, 
                   g.id, g.postal_code, SubStr(g.lookup_code, 1, 7 - (g.cnt + 1))
        From       grid g
        Where      cnt < 6
      ) 
--      M a i n    S Q L : 
SELECT     loc_id, loc_postal_code, pc_id, pc_lookup
FROM     ( Select     g.id as loc_id, g.postal_code as loc_postal_code, pc.id as pc_id, pc.pc_lookup, 
                      Max( 7 - cnt ) Over(Partition By g.id) as max_lngth
           From       postal_codes pc
           Inner Join grid g ON( g.lookup_code = pc.pc_lookup )
         )
WHERE    Length(pc_lookup) = max_lngth
ORDER BY loc_id
/*      R e s u l t : 
    LOC_ID  LOC_POSTAL_CODE       PC_ID  PC_LOOKUP
----------  ---------------  ----------  ---------
         1  A0A0A2                    5  A0A0A2
         2  A0A0A4                    3  A0A0A 
         3  A0A0B1                    2  A0A0  
         4  A0B0A3                   11  A0B0A3
         5  A0C1A2                   13  A0C1  
         6  B1B1B1                   15  B          */

Upvotes: 0

Mureinik
Mureinik

Reputation: 312219

I'd join the two tables with a like operator, and then use row_number to take the match with the longest pclookup:

SELECT loc_id, loc_postal_code, pc_id, pclookup
FROM   (SELECT loc.id AS loc_id,
               loc.postal_code AS loc_postal_code
               pc.id AS pc_id,
               pc.pclookup AS pclookup,
               ROW_NUBER() OVER (PARTITION BY loc.id 
                                 ORDER BY LENGTH(pc.pclookup) DESC) AS rn
        FROM   locations loc
        JOIN   postal_codes pc ON loc.postal_code LIKE pc.pclookup || '%') t
WHERE  rn = 1

Upvotes: 0

Related Questions