Reputation: 461
I'm fairly new to postgres and currently using 9.6. When trying to implement a full text search in postgres using it's jsonb documents I noticed slow search results for nested arrays. I used the 'explain' command and it didn't use any indices. For simplicity purpose I created a table to investigate:
CREATE TABLE book (
id BIGSERIAL NOT NULL,
data JSONB NOT NULL
);
My available indices:
CREATE INDEX book_author_idx
ON book USING GIN (to_tsvector('english', book.data ->> 'author'));
CREATE INDEX book_author_name_idx
ON book USING GIN (to_tsvector('english', book.data -> 'author' ->> 'name'));
And some data to fill a document:
INSERT INTO book (data)
VALUES (CAST('{"author": [{"id": 0, "name": "Cats"}, ' ||
' {"id": 1, "name": "Dogs"}]}' AS JSONB));
I'm able to search for book elements using the following query, however it doesn't use any index. With my actual data of 120k products it takes around 1200ms while other searches with an index take 0.2ms.
EXPLAIN ANALYZE
SELECT
id,
data ->> 'author' AS author
FROM book, jsonb_array_elements(data #> '{author}') author_array
WHERE to_tsvector('english', author_array ->> 'name') @@ to_tsquery('cat');
In contrast the next query uses the book_author_name_idx but because of the array structure doesn't find anything.
EXPLAIN ANALYZE
SELECT
id,
data ->> 'author' AS author
FROM book
WHERE to_tsvector('english', data -> 'author' ->> 'name') @@ to_tsquery('cat');
How can I tweak my query to use a language index? I'm aware, that I could make a new table for authors and only reference the ids, but I'd rather keep all data in one table for performance.
Upvotes: 2
Views: 1463
Reputation: 461
With the tips from posz comments I found a solution. Because the '||' function doesn't work the way I need it, I used a custom concat function for tsvectors. I used the code from glittershark on github and changed to_tsvector from 'default' to 'english' to fit my needs.
CREATE OR REPLACE FUNCTION concat_tsvectors(tsv1 TSVECTOR, tsv2 TSVECTOR)
RETURNS TSVECTOR AS $$
BEGIN
RETURN coalesce(tsv1, to_tsvector('english', ''))
|| coalesce(tsv2, to_tsvector('english', ''));
END;
$$ LANGUAGE plpgsql;
CREATE AGGREGATE tsvector_agg (
BASETYPE = TSVECTOR,
SFUNC = concat_tsvectors,
STYPE = TSVECTOR,
INITCOND = ''
);
Here is the custom function I wrote. Input is data as JSONB and output is a tsvector with aggregated author names.
CREATE OR REPLACE FUNCTION author_function(
IN data JSONB,
OUT resultNames TSVECTOR
)
RETURNS TSVECTOR AS $$
DECLARE
authorRecords RECORD;
combinedAuthors JSONB [];
singleAuthor JSONB;
BEGIN
FOR authorRecords IN (SELECT value
FROM jsonb_array_elements(data #> '{author}'))
LOOP
combinedAuthors := combinedAuthors || authorRecords.value;
END LOOP;
FOREACH singleAuthor IN ARRAY coalesce(combinedAuthors, '{}')
LOOP
resultNames := concat_tsvectors(resultNames, to_tsvector('english', singleAuthor ->> 'name'));
END LOOP;
END; $$
LANGUAGE plpgsql
IMMUTABLE;
Then I setup an index for my book objects.
CREATE INDEX book_author_function_idx
ON book USING GIN (author_function(book.data));
The author names already went through the to_tsvector('english', singleAuthor) function, so I can query for them like this:
EXPLAIN ANALYSE
SELECT
id,
data ->> 'author' AS author
FROM book
WHERE author_function(book.data) @@ to_tsquery('cat');
As a result queries for my actual data went from 1100-1200ms to ~0.5ms. I'm not sure if this is the best solution, so if you have better suggestions, please let me know.
Upvotes: 1