wisentini
wisentini

Reputation: 71

Return rows where array column has match for every pattern in given array

I have the following table:

╔════╦════════════════════════════════════════╗
║ id ║                 value                  ║
╠════╬════════════════════════════════════════╣
║  1 ║ ['friend', 'apple', 'cat']             ║
║  2 ║ ['cat', 'friend', 'dog']               ║
║  3 ║ ['pasta', 'best-friend', 'lizard']     ║
║  4 ║ ['wildcat', 'potato', 'alices-friend'] ║
╚════╩════════════════════════════════════════╝

My goal is to return all rows where value contains a given array. For example:

  1. ['friend', 'cat'] should return rows 1 and 2.

  2. ['%friend%', '%cat%'] should return rows 1, 2 and 4.

Currently I'm using this command:

SELECT DISTINCT id, value
FROM table
WHERE value @> (ARRAY['friend', 'cat']::VARCHAR[]);

But it's not working for example 2 listed above, when (array['%friend%', '%cat%']::varchar[]).

As it works for example 1, I think the problem is with the % symbols, but I don't know how to handle this, since I don't need to explicitly match the values.

DBFiddle

Upvotes: 3

Views: 2769

Answers (1)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656734

You want a match in the array column value for every LIKE pattern in the given array of matches.

This query is tricky for two main reasons:

  1. There is no array operator to compare a whole array to an array of LIKE patterns. (No "array contains" operator with pattern-matching.) The array column must be unnested.

  2. It's not enough to simply count matches after unnesting, as one pattern can match multiple times, masking the absence of matches for another.

Rephrase the task like this:

"Return all rows where none of the input patterns fails to find a match."

This query implements it, as efficiently as possible:

SELECT t.id, t.value
FROM   tbl t
WHERE  NOT EXISTS (
   SELECT FROM unnest('{%friend%, %cat%}'::text[]) AS p(pattern)
   WHERE  NOT EXISTS (
      SELECT FROM unnest(t.value) AS a(elem)
      WHERE  a.elem LIKE p.pattern
      )
   );

db<>fiddle here

Unfortunately, no index support possible. You'd have to normalize your relational design to allow that - with a many-to-one table replacing the array value.

Asides

Either way, to optimize performance, fork two distinct cases: search with and without special LIKE characters. Just check for the existence of characters with special meaning, i.e. one of \%_. Related:

Your simple query can deal with plain equality - after sanitizing it:

SELECT id, value
FROM   tbl
WHERE  value @> '{friend, cat}';

DISTINCT(id), value was just a misleading, equivalent syntax variant of DISTINCT id, value. Are you confusing this with DISTINCT ON? See:

And, assuming id is the PK, then DISTINCT is just an expensive no-op in the given query. Remove it.

Finally, use text[] rather than varchar[]. There are corner cases where text[] is superior, text being the "preferred" string type. See:

Upvotes: 2

Related Questions