David Laroche
David Laroche

Reputation: 33

MySQL longest prefix matching on 1 million records against 3000 possibilities

I have a table with roughly 1 million phone numbers and another table with roughly 3000 ISD codes (country codes). Now I want to match the phone numbers against all these ISD codes on the longest prefix matching. In the ISD table I could have for example:

1     US
1808  US (Hawaii)

If the phone number is now 1223244223, it should return US, but if its 1808322353 it should return US (Hawaii). What's the best way to achieve this at best performance?

Here is what I have so far. Unfortunately not pleased with the performance and I want to avoid functions:

DELIMITER $$

CREATE DEFINER=`root`@`localhost` FUNCTION `isd`(telnum varchar(32)) RETURNS int(4)
BEGIN
    RETURN (SELECT if(locate(isd, telnum)=1, (locate(isd, telnum)*length(isd)), 0) as score FROM tbl_ref_isd_v1 having score>0 order by score desc limit 1);
END

Furthermore I have this different function, which seems to be a bit quicker:

DELIMITER $$

CREATE DEFINER=`root`@`localhost` FUNCTION `isd_new`(telnum varchar(32)) RETURNS int(4)
BEGIN
    RETURN (
        select isd 
from test.tbl_ref_isd_v1 
where telnum like CONCAT(isd, '%') 
order by length desc LIMIT 1
    );
END

Upvotes: 3

Views: 1049

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1270301

Here is a query that does what you want:

select pn.*, max(ic.code)
from (select pn.*, ic.code, len(ic.code)
      from PhoneNumbers pn join
           ISDCodes ic
           on pn.phonenumber like concat(ic.code, '%')
     ) pni
group by pn.phonenumber;

Note that for initial strings, max() works because 1808 is bigger than 1, and so on.

EDIT:

Build an index on ISDCodes(code). Then the following should work well:

select pn.*, coalesce(ic5.code, ic4.code, ic3.code, ic2.code, ic1.code) as code
from PhoneNumbers pn left outer join
     ISDCodes ic1
     on left(pn.phonenumber, 1) = ic1.code left outer join
     ISDCodes ic2
     on left(pn.phonenumber, 2) = ic2.code left outer join
     ISDCodes ic3
     on left(pn.phonenumber, 3) = ic3.code left outer join
     ISDCodes ic4
     on left(pn.phonenumber, 4) = ic4.code left outer join
     ISDCodes ic5
     on left(pn.phonenumber, 5) = ic5.code;

You need to have the joins up to the longer ic.code.

Upvotes: 1

Related Questions