Atul
Atul

Reputation: 4320

Find nearest matching strings and its substrings using SQL query

I want to write a query that will fetch nearest matching strings of given string and its sub-strings in that order.

For example, lets say am having table of all names in a column. If I want to search name "ATUL", then results should list all distinct names matching first "ATUL%" then "ATU%" then "AT%" and then "A%" and finally all remaining records.

(Then I am going to pick up first N records out of it based on my needs)

Distinct union of queries is one solution I can think of. Is there any more efficient way to do this?

UPDATE:

Thanks for answers below. Meanwhile I was trying something on my own and found this query producing expected results, provided I have username column indexed

select * FROM all_usernames WHERE (username LIKE 'atul%') or (username LIKE 'atu%') or (username LIKE 'at%') or (username LIKE 'a%') or (username LIKE '%'); 

But is it standard behaviour or is it that I am just getting it coincidently?

Upvotes: 0

Views: 2293

Answers (2)

Timekiller
Timekiller

Reputation: 3126

Well, ATUL%, ATU% and AT% are all subsets of A%, so it's sufficient to select A% to get all your results. The tricky part is ordering them by how many first characters match. There seems to be no easy or elegant way to find this out, so if you want something generic you'll have to write your own function that compares substrings of string 1 and string 2 in a loop until they differ or length of either strings is reached, something like this:

CREATE FUNCTION `compare_first_chars`(str1 varchar(1000), str2 varchar(1000))
  RETURNS int
  DETERMINISTIC
BEGIN
  DECLARE v_offset INT;
  DECLARE v_minlen INT;

  IF str1 is null or str2 is null THEN
    return 0;
  END IF;

  SET v_offset = 0;

  SET v_minlen = least(length(str1), length(str2));

  count_loop: LOOP
    SET v_offset = v_offset + 1;

    IF v_offset > v_minlen THEN
      LEAVE count_loop;
    END IF;

    IF substr(str1, 1, v_offset) != substr(str2, 1, v_offset) THEN
      LEAVE count_loop;
    END IF;

  END LOOP;

  RETURN v_offset-1;
END

then you can order desc by it. If you don't need something that complex, then either use CASE in order, or distinct union as you mentioned in answer.

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269953

One method is to use like in the order by:

order by (case when name like 'ATUL%' then 1
               when name like 'ATU%' then 2
               when name like 'AT%' then 3
               when name like 'A%' then 4
               else 5
          end)

A more generic method is also brute force, but could go something like this:

order by (case when left(name, 9) = left('ATUL', 9) then 1
               when left(name, 8) = left('ATUL', 8) then 2
               when left(name, 7) = left('ATUL', 7) then 3
               when left(name, 6) = left('ATUL', 6) then 4
               when left(name, 5) = left('ATUL', 5) then 5
               when left(name, 4) = left('ATUL', 4) then 6
               when left(name, 3) = left('ATUL', 3) then 7
               when left(name, 2) = left('ATUL', 2) then 8
               when left(name, 1) = left('ATUL', 1) then 9
          end)

Upvotes: 2

Related Questions