Reputation: 21
I'm struggling to achieve the following.
I have a table of postal codes that includes full and partial postal codes. It will have other columns also.
In another table, I have locations, all which have full postal codes. My locations table will also have other columns.
What I need to achieve is a JOIN between postal_codes and locations such that one row per location remains, where that row has the values from the postal_codes table for the row that matches the longest string of characters for the postal code.
If I only have 1 location I can achieve this with a double-nested SELECT where the innermost does an ORDER BY the length DESC and the SELECT around that does a WHERE ROWNUM =1, but this approach won't work if I'm trying to match other locations at the same time.
I need an efficient approach as the postal_codes table has over 1 million rows.
I cannot add, remove, or alter the postal_codes table.
Any ideas would be appreciated!
Upvotes: 2
Views: 42
Reputation: 7876
One of the options to do it is using recursive cte getting out substrings of postal code that could be used as a condition in join's ON clause:
WITH -- S a m p l e D a t a :
postal_codes AS
( Select 1 as id, 'A0A' as pc_lookup From Dual Union All
Select 2, 'A0A0' From Dual Union All
Select 3, 'A0A0A' From Dual Union All
Select 4, 'A0A0A1' From Dual Union All
Select 5, 'A0A0A2' From Dual Union All
Select 6, 'A0B' From Dual Union All
Select 7, 'A0B0A' From Dual Union All
Select 8, 'A0B0A0' From Dual Union All
Select 9, 'A0B0A1' From Dual Union All
Select 10, 'A0B0A2' From Dual Union All
Select 11, 'A0B0A3' From Dual Union All
Select 12, 'A0C0' From Dual Union All
Select 13, 'A0C1' From Dual Union All
Select 14, 'A0C1A1' From Dual Union All
Select 15, 'B' From Dual Union All
Select 16, 'B2' From Dual Union All
Select 17, 'B3A' From Dual
),
locations AS
( Select 1 as id, 'A0A0A2' as postal_code From Dual Union All
Select 2, 'A0A0A4' From Dual Union All
Select 3, 'A0A0B1' From Dual Union All
Select 4, 'A0B0A3' From Dual Union All
Select 5, 'A0C1A2' From Dual Union All
Select 6, 'B1B1B1' From Dual
),
... recursive cte (grid) generating possible join strings for postal code
grid (cnt, id, postal_code, lookup_code) AS
( Select 1 as cnt,
l.id, l.postal_code, SubStr(l.postal_code, 1, 7 - 1) as lookup_code
From locations l
UNION ALL
Select g.cnt + 1,
g.id, g.postal_code, SubStr(g.lookup_code, 1, 7 - (g.cnt + 1))
From grid g
Where cnt < 6
)
-- M a i n S Q L :
SELECT loc_id, loc_postal_code, pc_id, pc_lookup
FROM ( Select g.id as loc_id, g.postal_code as loc_postal_code, pc.id as pc_id, pc.pc_lookup,
Max( 7 - cnt ) Over(Partition By g.id) as max_lngth
From postal_codes pc
Inner Join grid g ON( g.lookup_code = pc.pc_lookup )
)
WHERE Length(pc_lookup) = max_lngth
ORDER BY loc_id
/* R e s u l t :
LOC_ID LOC_POSTAL_CODE PC_ID PC_LOOKUP
---------- --------------- ---------- ---------
1 A0A0A2 5 A0A0A2
2 A0A0A4 3 A0A0A
3 A0A0B1 2 A0A0
4 A0B0A3 11 A0B0A3
5 A0C1A2 13 A0C1
6 B1B1B1 15 B */
Upvotes: 0
Reputation: 312219
I'd join the two tables with a like
operator, and then use row_number
to take the match with the longest pclookup
:
SELECT loc_id, loc_postal_code, pc_id, pclookup
FROM (SELECT loc.id AS loc_id,
loc.postal_code AS loc_postal_code
pc.id AS pc_id,
pc.pclookup AS pclookup,
ROW_NUBER() OVER (PARTITION BY loc.id
ORDER BY LENGTH(pc.pclookup) DESC) AS rn
FROM locations loc
JOIN postal_codes pc ON loc.postal_code LIKE pc.pclookup || '%') t
WHERE rn = 1
Upvotes: 0