Reputation: 9340
Let me describe the problem based on the example below. Lets say there is a string "abc12345" (could be any!!!) and there is a table mytable with a column mycolumn of varchar(100).
There are some rows that ends with the last character 5.
There are some rows that ends with the last characters 45.
There are some rows that ends with the last characters 345
There are no rows that ends with the last characters 2345.
In this case these rows should be selected:
SELECT * FROM mytable WHERE mycolumn LIKE "%345"
That's because "345" is the longest right substring of "abc12345" that occurs at least once as the right substring of at least one string in the mycolumn column. Any ideas how to write it in one query? Thank you.
Upvotes: 4
Views: 455
Reputation: 5326
If you cannot restructure the table I would approach the problem this way:
Write an aggregate UDF LONGEST_SUFFIX_MATCH(col, str)
in C (see an example in sql/udf_example.c
in the MySQL source, search for avgcost
)
SELECT @longest_match:=LONGEST_SUFFIX_MATCH(mycol, "abcd12345") FROM mytbl; SELECT * FROM mytbl WHERE mycol LIKE CONCAT('%', SUBSTR("abcd12345", -@longest_match))
If you could restructure the table, I do not have a complete solution yet, but the first thing I would add a special column mycol_rev
obtained by reversing the string (via REVERSE()
function) and create a key on it, then use that key for lookups. Will post a full solution when I have a moment.
Update:
If you can add a reversed column with a key on it:
This solution should be very fast as long as you do not have too many rows coming back. We will have a total of O(log(L)) queries where L is the length of the search string each of those being a B-tree search with the read of just one row followed by another B-tree search with the index read of only the needed rows.
Upvotes: 1
Reputation: 4729
Interesting puzzle :)
The hardest problem here is finding what is the length of the target suffix matching your suffix pattern.
In MySQL you probably need to use either generating series or a UDF. Others proposed these already.
In PostgreSQL and other systems that provide regexp-based substring, you can use the following trick:
select v,
reverse(
substring(
reverse(v) || '#' || reverse('abcdefg')
from '^(.*).*#\1.*'
)) res
from table;
What it does is:
#
in between the strings that's important, you need a character that doesn't exist in your string.substring
, such that
^
(.*)
.*
#
(.*)
to be present right after #
. So we use \1
.*
Once you have the longest suffix, finding maximum length, and then finding all strings having the suffix of that length is trivial.
Here's a SQLFiddle using PostgreSQL:
Upvotes: 1
Reputation: 1269953
This is a brute force method:
select t.*
from (select t.*,
dense_rank() over (order by (case when mycolumn like '%abc12345' then 1
when mycolumn like '%bc12345' then 2
when mycolumn like '%c12345' then 3
when mycolumn like '%12345' then 4
when mycolumn like '%2345' then 5
when mycolumn like '%345' then 6
when mycolumn like '%45' then 7
when mycolumn like '%5' then 8
end)
) as seqnum
where mycolumn like '%5' -- ensure at least one match
from t
) t
where seqnum = 1;
This then inspires something like this:
select t.*
from (select t.*, max(i) over () as maxi
from t join
(select str, generate_series(1, length(str)) as i
from (select 'abc12345' as str) s
) s
on left(t.mycolumn, i) = left(str, i)
) t
where i = maxi;
Upvotes: 2