Reputation: 31
I am working on setting up a vector store in my Supabase database, and I need to execute a script for each new table I create. The script enables the pgvector extension, creates a table, and defines a function to search for documents based on vector similarity.
Here's the script which I need to execute:
-- Check if the extension exists
create extension if not exists vector;
-- Drop the function if it exists
drop function if exists match_documents(vector(1024), int, jsonb);
-- Create a table to store your documents
create table documents (
id bigint primary key generated always as identity,
content text, -- corresponds to Document.pageContent
metadata jsonb, -- corresponds to Document.metadata
embedding vector(1024) -- 1024 works for OpenAI embeddings, change if needed
);
-- Create a function to search for documents
create function match_documents (
query_embedding vector(1024),
match_count int default null,
filter jsonb default '{}'
) returns table (
id bigint,
content text,
metadata jsonb,
similarity float
) language plpgsql as $$
#variable_conflict use_column
begin
return query
select
id,
content,
metadata,
1 - (documents.embedding <=> query_embedding) as similarity
from documents
where metadata @> filter
order by documents.embedding <=> query_embedding
limit match_count;
end;
$$;
The problem is that this script always creates a table named documents. I want to create a function that takes the table name as an input and executes the script, allowing me to create tables with different names dynamically. This function will be called using remote RPC calls from my Supabase Python SDK.
Here is the function I wrote to achieve this:
create or replace function setup_vector_store(table_name text)
returns void
language plpgsql
as $$
begin
-- Enable the pgvector extension
execute 'create extension if not exists vector';
-- Drop the function if it exists
execute 'drop function if exists match_documents(vector(1024), int, jsonb)';
-- Create the table
execute format('
create table %I (
id bigint primary key generated always as identity,
content text,
metadata jsonb,
embedding vector(1024)
)', table_name);
-- Create the search function
execute format('
create function match_documents (
query_embedding vector(1024),
match_count int default null,
filter jsonb default ''{}''
) returns table (
id bigint,
content text,
metadata jsonb,
similarity float
) language plpgsql as $$
begin
return query
select
id,
content,
metadata,
1 - (''%I''.embedding <=> query_embedding) as similarity
from %I
where metadata @> filter
order by ''%I''.embedding <=> query_embedding
limit match_count;
end;
$$;', table_name, table_name, table_name);
end;
$$;
However, when I run this function, I encounter the following error:
ERROR: 42601: syntax error at or near "#"
LINE 32: #variable_conflict use_column
^
I attempted to write such a function but faced issues with its creation. Additionally, if there are any other methods to dynamically create these tables, please suggest them!
Upvotes: 2
Views: 221
Reputation: 26322
In your second example, the $$
that open the function body get matched to the $$
that also open another function definition inside the dynamic SQL execute format(
call, leaving orphaned/misplaced #variable_conflict use_column
, that I'm guessing you had there previously and now it changed for some other confusing error due to mismatched quotes.
Replace them with named dollar quotes: for example, outer ones with $f1$
, inner ones with $f2$
:
demo at db<>fiddle
create or replace function setup_vector_store(table_name text)
returns void language plpgsql as $f1$
begin
execute format($dynsql$
create table %1$I (
id bigint primary key generated always as identity,
content text,
metadata jsonb,
embedding vector(1024) );
drop function if exists match_documents(vector(1024), int, jsonb);
create function match_documents (
query_embedding vector(1024),
match_count int default null,
filter jsonb default '{}'
) returns table (
id bigint,
content text,
metadata jsonb,
similarity float
) language plpgsql as $f2$ #variable_conflict use_column
begin
return query
select id,
content,
metadata,
1 - (%1$I.embedding <=> query_embedding) as similarity
from %1$I
where metadata @> filter
order by %1$I.embedding <=> query_embedding
limit match_count;
end; $f2$;
$dynsql$, table_name);
end; $f1$;
Note that you don't have to repeat the %I
placeholder in dynamic SQL calls if you use format()
's positional notation. You can re-use %1$I
multiple times, meaning that you want the 1st param, as an Identifier.
You can also run all 3 statements inside a single execute format()
.
I previously commented here that the #
thing requires a newline before begin
, which turned out to be false.
Upvotes: 0