Yavanosta
Yavanosta

Reputation: 1670

Append binary content to bytea column in PostgreSQL database

My case:

I have some binary data, which has been broken into a lot of parts. Special table FileParts containts fileId, partNo, and data.

I need take all parts together into another table MyFilesStorage.

Now i'm choosing between two realizations:

DO $CODE$
declare
  r record;
begin
  UPDATE public.MyFilesStorage SET mainFileData = E''::bytea WHERE id = 'e14a26c0-db4b-47e1-8b66-e091fb3ba199'::uuid;

  for r in ( select data 
             from public.FileParts 
             where fileId = '89cb8598-436b-49b3-bb1c-34534c6d068e'::uuid
             order by partNo ) loop
    UPDATE public.MyFilesStorage SET mainFileData = mainFileData || r.data WHERE id = 'e14a26c0-db4b-47e1-8b66-e091fb3ba199'::uuid;
  end loop;

end;
$CODE$

I set data as empty, then read parts one by one and append each part into main table.

Another variant:

DO $CODE$
declare
  r record;
  p_result bytea;
begin
  p_result = E''::bytea;

  for r in ( select data 
             from public.FileParts 
             where fileId = '89cb8598-436b-49b3-bb1c-34534c6d068e'::uuid
             order by partNo ) loop
    p_result = p_result || r.data;
  end loop;

  UPDATE public.MyFilesStorage SET mainFileData = p_result WHERE id = 'e14a26c0-db4b-47e1-8b66-e091fb3ba199'::uuid;
end;
$CODE$

Here i use temp variable. Second is much more quick but i dont know wich will take more memory? Is first i need memory to load all file to RAM, and what about first? Will postgre loads all content here: mainFileData = mainFileData || r.data?

Maybe there is another way to do this, becouse both variants a veeeeery slow? In oracle i use DBMS_LOB.APPEND for this operation.

Upvotes: 1

Views: 5174

Answers (2)

vyegorov
vyegorov

Reputation: 22845

Your approach looks correct, check PostgreSQL manual here.

You can also define your own aggregate to do the job for you:

CREATE AGGREGATE bytea_add(bytea)
(
    sfunc = byteacat,
    stype = bytea,
    initcond = E''
);

And the use a common SQL, like this:

UPDATE public.MyFIlesStorage SET mainFileData = (
    SELECT bytea_add(data) FROM public.FileParts
     WHERE fileId = '89cb8598-436b-49b3-bb1c-34534c6d068e'::uuid
     -- ORDER BY partNo -- ORDER BY partNo will not work
 )
 WHERE id = 'e14a26c0-db4b-47e1-8b66-e091fb3ba199'::uuid;

EDIT:

UPDATE public.MyFilesStorage mfs SET mainFileData = fp.all_data
  FROM (
    SELECT bytea_add(data) OVER (ORDER BY partNo) AS all_data,
           rank() OVER (ORDER BY partNo DeSC) AS pos
      FROM public.FileParts
     WHERE fileId = '89cb8598-436b-49b3-bb1c-34534c6d068e'::uuid
 ) AS fp
 WHERE fp.pos = 1
   AND mfs.id = 'e14a26c0-db4b-47e1-8b66-e091fb3ba199'::uuid;

You can check the output of the inner SQL separately.

Upvotes: 3

Daniel Vérité
Daniel Vérité

Reputation: 61506

The first version is slower because PostgreSQL doesn't do in-place updates at the storage level, it creates a new version of the row for each UPDATE. So for a row that would go from 0Mb to 100MB by 10Mb increments, what will really be written to disk is not 10x10Mb but rather: 10Mb+20Mb+30Mb+...+90Mb+100Mb = 550Mb. On the other hand, the memory consumption will remain low with no more than 10Mb allocated in memory at a time.

The second version is faster with only 100Mb to write, but it needs to allocate 100Mb in memory.

The structure of the FileParts table with ordered chunks is generally easier to manage for large contents, why bother to convert it into the monolithic other structure?

Upvotes: 3

Related Questions