Kit Sunde
Kit Sunde

Reputation: 37095

Find the row with the largest matching string across two columns

Say I have this:

page_url                      | canvas_url
---------------------------------------------------------------
http://www.google.com/        | http://www.google.com/barfoobaz
http://www.google.com/foo/bar | http://www.google.com/foo

I'd like to find the row that is the start of my string ordered by the longest match. The problem I'm facing is finding the longest matching string, rather than just the row that matches that also has a matching one. I.e.

http://www.google.com/foo matches page_url in row 1 and canvas_url in row 2, but if it's length of both columns rather than a match it would think row 1 is the better match as canvas_url in row 1 is longer.

I could grab all matches and then filter the length in code doing something like:

SELECT *, LENGTH(canvas_url), LENGTH(page_url)
FROM app 
WHERE
    'http://www.google.com/foo' LIKE CONCAT(canvas_url, '%') OR
    'http://www.google.com/foo' LIKE CONCAT(page_url, '%')

Or perform 2 subselections grabbing the top matches of canvas_url respective page_url and then filter that to 1 in code, but I would prefer (barring any ridiculous performance issues) having the database just return what I need.

My immediate concern is MySQL but I need to target SQLite and Postgress, so I'd be happy with an answer in either of those.

Suggestions?

Upvotes: 0

Views: 743

Answers (3)

fthiella
fthiella

Reputation: 49089

Maybe you just need something like this?

SELECT page_url as url, LENGTH(page_url) as len
FROM pages WHERE 'http://www.google.com/foo' LIKE CONCAT(page_url, '%')
UNION
SELECT canvas_url as url, LENGTH(canvas_url) as len
FROM pages WHERE 'http://www.google.com/foo' LIKE CONCAT(canvas_url, '%')
ORDER BY len DESC
LIMIT 1

Upvotes: 1

Tim Lehner
Tim Lehner

Reputation: 15251

This will work to get the longest actual match length (not just the longest url in the record):

-- Get page_url matches
SELECT *, LENGTH(page_url) AS MatchLen
FROM app 
WHERE 'http://www.google.com/foo' LIKE CONCAT(page_url, '%') -- can't tell from question if this should be reversed
UNION ALL
-- Get canvas_url matches
SELECT *, LENGTH(canvas_url) AS MatchLen
FROM app 
WHERE 'http://www.google.com/foo' LIKE CONCAT(canvas_url, '%')
-- Bring the longest matches to the top
ORDER BY MatchLen DESC -- May need to add a tie-breaker here
LIMIT 1

Here is a running example on SqlFiddle.

Upvotes: 3

Gordon Linoff
Gordon Linoff

Reputation: 1270793

If you need to find just the first row, you want an order by and limit. You have to be a bit clever about how to arrange it:

SELECT *, LENGTH(canvas_url), LENGTH(page_url)
FROM app 
WHERE canvas_url like concat('http://www.google.com/foo' '%') OR
      page_url like concat('http://www.google.com/foo', '%')
order by (case when canvas_url like concat('http://www.google.com/foo' '%') and
                    page_url like concat('http://www.google.com/foo', '%') and
                    LENGTH(canvas_url) < LENGTH(page_url)
               then LENGTH(page_url)
               when canvas_url like concat('http://www.google.com/foo' '%') and
                    page_url like concat('http://www.google.com/foo', '%') and
                    LENGTH(canvas_url) >= LENGTH(page_url)
               when canvas_url like concat('http://www.google.com/foo' '%')
               then LENGTH(canvas_url)
               else LENGTH(page_url)
          end)
limit 1

This is ordering by the longer of the matching strings, and then returning exactly one row. Note that LIMIT is not standard, so different databases have different mechanisms for returning one row.

Upvotes: 0

Related Questions