Reputation: 105
Does PostgreSQL have some way for me to find if an array is contained within another array, but with the same ordering?
For example, I want to know if array1 is within array2 with matching elements in the same order.
array1[1, 3, 6, 8]
array2[3, 8, 2, 9, 10, 1, 6]
Obviously not the case in the example, but is there a built-in method for this in PostgreSQL or should I create my own function?
Version of PostgreSQL is 9.6. The actual numbers the query will run on are bigints.
Upvotes: 2
Views: 176
Reputation: 656952
All elements of the second array are in the first, too. In the same order, but gaps are allowed.
I suggest this polymorphic PL/pgSQL function:
CREATE OR REPLACE FUNCTION array_contains_array_in_order(arr1 ANYARRAY
, arr2 ANYARRAY
, elem ANYELEMENT = NULL)
RETURNS bool AS
$func$
DECLARE
pos int := 1;
BEGIN
FOREACH elem in ARRAY arr2
LOOP
pos := pos + array_position(arr1[pos:], elem); -- see below
IF pos IS NULL THEN
RETURN FALSE;
END IF;
END LOOP;
RETURN true; -- all elements found in order
END
$func$ LANGUAGE plpgsql IMMUTABLE COST 3000;
As @a_horse commented, we can omit the upper bound in array subscripts to mean "unbounded" (arr1[pos:]
). In older versions before 9.6 substitute with arr1[pos:2147483647]
- 2147483647 = 2^31 - 1 being the theoretical max array index, the greatest signed int4 number.
This works for ...
any 1-dimensional array type, not just integer[]
.
arrays with NULL values, thanks to array_position()
which works for NULL, too.
arrays with duplicate elements.
only for default array subscripts starting with 1. You can easily cover non-standard subscripts if needed:
Normalize array subscripts for 1-dimensional array so they start with 1
About the ANYELEMENT
trick:
I ran a quick performance test comparing this function to the one @a_horse supplied. This one was around 5x faster.
If you use this a filter for a big table I strongly suggest you combine it with a (logically redundant) sargable filter like:
SELECT *
FROM tbl
WHERE arr @> '{2,160,134,58,149,111}'::int[]
AND array_contains_array_in_order(arr, '{2,160,134,58,149,111}')
This will use a GIN index on the array column like:
CREATE INDEX ON tbl USING gin (arr);
And only filter remaining (typically very few!) arrays that share all elements. Typically much faster.
Note: applies to integer[]
exclusively, not smallint[]
or bigint[]
or any other array type!
Careful, if you have installed the intarray extension, which provides its own variant of the @>
operator for int[]
. Either you create an (additional) GIN index with its special operator class (which is a bit faster where applicable):
CREATE INDEX ON intarr USING gin (arr gin__int_ops);
Or, while you only have a GIN index with the default operator class, you must explicitly denote the standard operator to cooperate with the index:
WHERE arr OPERATOR(pg_catalog.@>) '{2,160,134,58,149,111}'::int[]
Details:
As commented, your case is simpler:
The complete second array is included in the first (same order, no gaps!).
CREATE OR REPLACE FUNCTION array_contains_array_exactly(arr1 ANYARRAY, arr2 ANYARRAY)
RETURNS bool AS
$func$
DECLARE
len int := array_length(arr2, 1) - 1; -- length of arr2 - 1 to fix off-by-1
pos int; -- for current search postition in arr1
BEGIN
/* -- OPTIONAL, if invalid input possible
CASE array_length(arr1, 1) > len -- array_length(arr2, 1) - 1
WHEN TRUE THEN -- valid arrays
-- do nothing, proceed
WHEN FALSE THEN -- arr1 shorter than arr2
RETURN FALSE; -- or raise exception?
ELSE -- at least one array empty or NULL
RETURN NULL;
END CASE;
*/
pos := array_position(arr1, arr2[1]); -- pos of arr2's 1st elem in arr1
WHILE pos IS NOT NULL
LOOP
IF arr1[pos:pos+len] = arr2 THEN -- array slice matches arr2 *exactly*
RETURN TRUE; -- arr2 is part of arr1
END IF;
pos := pos + array_position(arr1[(pos+1):], arr2[1]);
END LOOP;
RETURN FALSE;
END
$func$ LANGUAGE plpgsql IMMUTABLE COST 1000;
Considerably faster than the above for longer arrays. All other considerations still apply.
Upvotes: 2