Reputation: 1670
My case:
I have some binary data, which has been broken into a lot of parts. Special table FileParts
containts fileId
, partNo
, and data
.
I need take all parts together into another table MyFilesStorage
.
Now i'm choosing between two realizations:
DO $CODE$
declare
r record;
begin
UPDATE public.MyFilesStorage SET mainFileData = E''::bytea WHERE id = 'e14a26c0-db4b-47e1-8b66-e091fb3ba199'::uuid;
for r in ( select data
from public.FileParts
where fileId = '89cb8598-436b-49b3-bb1c-34534c6d068e'::uuid
order by partNo ) loop
UPDATE public.MyFilesStorage SET mainFileData = mainFileData || r.data WHERE id = 'e14a26c0-db4b-47e1-8b66-e091fb3ba199'::uuid;
end loop;
end;
$CODE$
I set data as empty, then read parts one by one and append each part into main table.
Another variant:
DO $CODE$
declare
r record;
p_result bytea;
begin
p_result = E''::bytea;
for r in ( select data
from public.FileParts
where fileId = '89cb8598-436b-49b3-bb1c-34534c6d068e'::uuid
order by partNo ) loop
p_result = p_result || r.data;
end loop;
UPDATE public.MyFilesStorage SET mainFileData = p_result WHERE id = 'e14a26c0-db4b-47e1-8b66-e091fb3ba199'::uuid;
end;
$CODE$
Here i use temp variable. Second is much more quick but i dont know wich will take more memory? Is first i need memory to load all file to RAM, and what about first? Will postgre loads all content here: mainFileData = mainFileData || r.data
?
Maybe there is another way to do this, becouse both variants a veeeeery slow? In oracle i use DBMS_LOB.APPEND
for this operation.
Upvotes: 1
Views: 5174
Reputation: 22845
Your approach looks correct, check PostgreSQL manual here.
You can also define your own aggregate to do the job for you:
CREATE AGGREGATE bytea_add(bytea)
(
sfunc = byteacat,
stype = bytea,
initcond = E''
);
And the use a common SQL, like this:
UPDATE public.MyFIlesStorage SET mainFileData = (
SELECT bytea_add(data) FROM public.FileParts
WHERE fileId = '89cb8598-436b-49b3-bb1c-34534c6d068e'::uuid
-- ORDER BY partNo -- ORDER BY partNo will not work
)
WHERE id = 'e14a26c0-db4b-47e1-8b66-e091fb3ba199'::uuid;
EDIT:
UPDATE public.MyFilesStorage mfs SET mainFileData = fp.all_data
FROM (
SELECT bytea_add(data) OVER (ORDER BY partNo) AS all_data,
rank() OVER (ORDER BY partNo DeSC) AS pos
FROM public.FileParts
WHERE fileId = '89cb8598-436b-49b3-bb1c-34534c6d068e'::uuid
) AS fp
WHERE fp.pos = 1
AND mfs.id = 'e14a26c0-db4b-47e1-8b66-e091fb3ba199'::uuid;
You can check the output of the inner SQL separately.
Upvotes: 3
Reputation: 61506
The first version is slower because PostgreSQL doesn't do in-place updates at the storage level, it creates a new version of the row for each UPDATE. So for a row that would go from 0Mb to 100MB by 10Mb increments, what will really be written to disk is not 10x10Mb but rather: 10Mb+20Mb+30Mb+...+90Mb+100Mb = 550Mb. On the other hand, the memory consumption will remain low with no more than 10Mb allocated in memory at a time.
The second version is faster with only 100Mb to write, but it needs to allocate 100Mb in memory.
The structure of the FileParts
table with ordered chunks is generally easier to manage for large contents, why bother to convert it into the monolithic other structure?
Upvotes: 3