Reputation: 179
I want to check if every element in a set contained in any part of rows in a column. For example,
Set A = {5013, aab, 402dha)
Column B
adaf**5013**dad344 23**aab**yyyy zzz**402dha**vuuuda ..... ...
My question is if there is a way I can do that? Or is there a function similar to 'VLookup' in Excel?
I will appreciate any idea!
Upvotes: 1
Views: 566
Reputation: 324701
That's a pretty horrid schema; best results will be achieved by changing it so you store your multiple values in:
all of which allow you to use fairly simple and sane SQL expressions to determine what you want, and are indexable (via regular b-tree indexes for child tables, and via GiST or GIN indexes for arrays and hstore) for better performance on large tables.
It's certainly possible as-is, but performance will be miserable. One way is to use regexp_split_to_array
to convert the column to an array, then use the array operators to test for overlap.
See this SQLFiddle demo, which uses an expanded test set because yours wasn't really adequate to demonstrate the problem.
I've shown both "any of set appears in column" (%%
) and "all of set appears in column" (@>
) since it isn't clear from the question which you want.
Setup:
CREATE TABLE test(gah text);
INSERT INTO test(gah) VALUES
('adaf**5013**dad344'),
('23**aab**yyyy'),
('zzz**402dha**vuuuda'),
('no**matches**here**lalala'),
('5013**aab**402dha'),
('402dha**aab**somethingelse**5013'),
('402dha**aab**5013');
Demo:
regress=> SELECT gah FROM test
WHERE regexp_split_to_array(gah, '\*\*') && ARRAY['5013', 'aab', '402dha'];
gah
----------------------------------
adaf**5013**dad344
23**aab**yyyy
zzz**402dha**vuuuda
5013**aab**402dha
402dha**aab**somethingelse**5013
402dha**aab**5013
(6 rows)
regress=> SELECT gah FROM test
WHERE regexp_split_to_array(gah, '\*\*') @> ARRAY['5013', 'aab', '402dha'];
gah
----------------------------------
5013**aab**402dha
402dha**aab**somethingelse**5013
402dha**aab**5013
(3 rows)
Amazingly, you can actually create an index that will benefit this query by making use of PostgreSQL's support of expression indexes. Of course, just because you can do it doesn't mean it's a good idea:
regress=> CREATE INDEX test_glah_resplit_gin ON test
USING GIN(( regexp_split_to_array(gah, '\*\*') ));
CREATE INDEX
regress=> -- Only for testing purposes, don't use in production:
regress=> SET enable_seqscan = off;
SET
regress=> explain SELECT gah FROM test WHERE regexp_split_to_array(gah, '\*\*') @> ARRAY['5013', 'aab', '402dha'];
QUERY PLAN
-----------------------------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=16.00..20.02 rows=1 width=32)
Recheck Cond: (regexp_split_to_array(gah, '\*\*'::text) @> '{5013,aab,402dha}'::text[])
-> Bitmap Index Scan on test_glah_resplit_gin (cost=0.00..16.00 rows=1 width=0)
Index Cond: (regexp_split_to_array(gah, '\*\*'::text) @> '{5013,aab,402dha}'::text[])
(4 rows)
regress=> explain SELECT gah FROM test WHERE regexp_split_to_array(gah, '\*\*') && ARRAY['5013', 'aab', '402dha'];
QUERY PLAN
-----------------------------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=16.00..20.02 rows=1 width=32)
Recheck Cond: (regexp_split_to_array(gah, '\*\*'::text) && '{5013,aab,402dha}'::text[])
-> Bitmap Index Scan on test_glah_resplit_gin (cost=0.00..16.00 rows=1 width=0)
Index Cond: (regexp_split_to_array(gah, '\*\*'::text) && '{5013,aab,402dha}'::text[])
(4 rows)
GIN indexes are expensive to update, so you'll pay a significant performance price on insert
/update
if you use this approach. That's true with ordinary arrays; using regexp_split_to_table
to create them on the fly just makes it a little worse. See GIN tips and the intro to GIN indexes.
For example, inserting a million rows into my test table with INSERT INTO test(gah) SELECT 'aaaaabbbbb'||(x::text) FROM generate_series(1,1000000) x;
took 22 seconds with the GIN index in place, and 1.6 seconds after dropping it. That's likely to be a particularly bad case because of the uniformity of the values, though.
Upvotes: 5