Array 1 contained in array 2 and elements in the same order

Question

Does PostgreSQL have some way for me to find if an array is contained within another array, but with the same ordering?
For example, I want to know if array1 is within array2 with matching elements in the same order.

array1[1, 3, 6, 8]
array2[3, 8, 2, 9, 10, 1, 6]

Obviously not the case in the example, but is there a built-in method for this in PostgreSQL or should I create my own function?

Version of PostgreSQL is 9.6. The actual numbers the query will run on are bigints.

Erwin Brandstetter · Accepted Answer

General case

All elements of the second array are in the first, too. In the same order, but gaps are allowed.

I suggest this polymorphic PL/pgSQL function:

CREATE OR REPLACE FUNCTION array_contains_array_in_order(arr1 ANYARRAY
                                                       , arr2 ANYARRAY
                                                       , elem ANYELEMENT = NULL)
  RETURNS bool AS
$func$
DECLARE
   pos int := 1;
BEGIN
   FOREACH elem in ARRAY arr2
   LOOP
      pos := pos + array_position(arr1[pos:], elem);  -- see below
      IF pos IS NULL THEN
         RETURN FALSE;
      END IF;
   END LOOP; 

   RETURN true; --  all elements found in order
END
$func$ LANGUAGE plpgsql IMMUTABLE COST 3000;

As @a_horse commented, we can omit the upper bound in array subscripts to mean "unbounded" (arr1[pos:]). In older versions before 9.6 substitute with arr1[pos:2147483647] - 2147483647 = 2^31 - 1 being the theoretical max array index, the greatest signed int4 number.

This works for ...

any 1-dimensional array type, not just integer[].
arrays with NULL values, thanks to array_position() which works for NULL, too.
arrays with duplicate elements.
only for default array subscripts starting with 1. You can easily cover non-standard subscripts if needed:
Normalize array subscripts for 1-dimensional array so they start with 1

About the ANYELEMENT trick:

How do I get the type of an array's elements?

Performance

I ran a quick performance test comparing this function to the one @a_horse supplied. This one was around 5x faster.

If you use this a filter for a big table I strongly suggest you combine it with a (logically redundant) sargable filter like:

SELECT *
FROM   tbl
WHERE  arr @> '{2,160,134,58,149,111}'::int[] 
AND    array_contains_array_in_order(arr, '{2,160,134,58,149,111}')

This will use a GIN index on the array column like:

CREATE INDEX ON tbl USING gin (arr);

And only filter remaining (typically very few!) arrays that share all elements. Typically much faster.

Caveats with intarray module

Note: applies to integer[] exclusively, not smallint[] or bigint[] or any other array type!

Careful, if you have installed the intarray extension, which provides its own variant of the @> operator for int[]. Either you create an (additional) GIN index with its special operator class (which is a bit faster where applicable):

CREATE INDEX ON intarr USING gin (arr gin__int_ops);

Or, while you only have a GIN index with the default operator class, you must explicitly denote the standard operator to cooperate with the index:

WHERE  arr OPERATOR(pg_catalog.@>) '{2,160,134,58,149,111}'::int[]

Details:

GIN index on smallint[] column not used or error "operator is not unique"

Simple case

As commented, your case is simpler:

The complete second array is included in the first (same order, no gaps!).

CREATE OR REPLACE FUNCTION array_contains_array_exactly(arr1 ANYARRAY, arr2 ANYARRAY)
  RETURNS bool AS
$func$
DECLARE
   len int := array_length(arr2, 1) - 1;  -- length of arr2 - 1 to fix off-by-1
   pos int;                               -- for current search postition in arr1
BEGIN
   /*                                     -- OPTIONAL, if invalid input possible
   CASE array_length(arr1, 1) >  len      -- array_length(arr2, 1) - 1 
   WHEN TRUE  THEN                        -- valid arrays
      -- do nothing, proceed
   WHEN FALSE THEN                        -- arr1 shorter than arr2
      RETURN FALSE;                       -- or raise exception?
   ELSE                                   -- at least one array empty or NULL
      RETURN NULL;
   END CASE;
   */

   pos := array_position(arr1, arr2[1]);  -- pos of arr2's 1st elem in arr1

   WHILE pos IS NOT NULL
   LOOP
      IF arr1[pos:pos+len] = arr2 THEN    -- array slice matches arr2 *exactly*
         RETURN TRUE;                     -- arr2 is part of arr1
      END IF;

      pos := pos + array_position(arr1[(pos+1):], arr2[1]);
   END LOOP; 

   RETURN FALSE;
END
$func$ LANGUAGE plpgsql IMMUTABLE COST 1000;

Considerably faster than the above for longer arrays. All other considerations still apply.

Array 1 contained in array 2 and elements in the same order

Answers (1)

General case

Performance

Caveats with intarray module

Simple case

Related Questions