user48956
user48956

Reputation: 15810

Prioritizing sub-searches, returning results in a reasonable time

I'm trying to build a query in SQL that performs a real-time text search. We've enabled trigram indexes to help speed up the search (We're using Postgres, but I think some of this question is a general SQL problem).

What I'd like is a prioritized list of results

  1. textcol ILIKE 'age%', shortest first (Typically few matches, sorting is fast)
  2. textcol ILIKE '% age%', shortest first (Typically few matches, sorting is fast)
  3. textcol ILIKE '%age%', shortest first (Very many matches, but we likely don't care strongly for short matches).

... results should be ordered by priority, with subsequent duplicates excluded, (and where 'age' is the query string).

We could try:

SELECT * FROM (
  (SELECT * FROM TABLE WHERE textcol ILIKE 'age%' ORDER BY LENGTH(textcol) ) a
  UNION
  (SELECT * FROM TABLE WHERE textcol ILIKE '% age%' ORDER BY LENGTH(textcol) ) a
  UNION
  (SELECT * FROM TABLE WHERE textcol ILIKE '%age%' ORDER BY LENGTH(textcol) ) c
) z
LIMIT 200

But:

To resolve the ordering issues, we can try:

SELECT * FROM (
  (SELECT 1 as priority, textcol FROM TABLE WHERE textcol ILIKE 'age%' ) a
  UNION
  (SELECT 2 as priority, textcol FROM TABLE WHERE textcol ILIKE 'age%' ) b
  UNION
  (SELECT 3 as priority, textcol FROM TABLE WHERE textcol ILIKE 'age%' ) c
) as t
ORDER BY priority, LENGTH(textcol)
LIMIT 200

But again, this is also grossly slow (> 14 seconds) because we're again sorting the large set of matches produced by 3.

We can have the client independently execute each of:

  1. textcol ILIKE 'age%'... ORDER BY LENGTH(textcol) ... LIMIT 200 # shortest first
  2. textcol ILIKE '% age%'... ORDER BY LENGTH(textcol) ...LIMIT 200 # shortest first
  3. textcol ILIKE '%age%' ... LIMIT 200 #(unsorted)

... and the result are received in milliseconds (duplicates resolved on the client). Is there no way to implement this is efficiently in SQL?

Upvotes: 1

Views: 45

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1271111

First, I would start with:

SELECT t.*
FROM TABLE t
WHERE textcol ILIKE '%age%'
ORDER BY (textcol ILIKE '% age%') DESC,
         (textcol ILIKE 'age%') DESC,
         LENGTH(textcol)

This might get around some of the performance bottleneck, such as the duplicate elimination.

If that is not good enough, then based on the last query, you can do:

(select t.*
 from table t
 where textcol ilike 'age%'
 order by length(textcol) desc
 limit 200
) union all
(select t.*
 from table t
 where textcol ilike '% age%'
 order by length(textcol) desc
 limit 200
)
union all
(select t.*
 from table t
 where textcol ilike '%age%' and
       not textcol ilike '% age%' and
       not textcol ilike 'age%'
 limit 200
);

Upvotes: 1

Related Questions