Reputation: 4721
I have a Java application that generates ids for primary keys by encoding cryptographically random bytes to base32 (RFC4648).
How do I do the same with PostgreSQL from an SQL script?
The gen_random_bytes
seems to do the job for generating the random bytes but it seems there isn't anything available to encode them to base32.
Upvotes: 3
Views: 3061
Reputation: 927
Was curious if it was possible to convert bytea
of any length to base32
. Here's sql code with a couple of helper functions to ease the understanding. It appears all used built-in functions are marked immutable
so the helpers should inline easily.
Tested on Postgres 14:
create or replace function public.bytea_to_varbit(p_bytes bytea)
returns varbit as $$
-- https://stackoverflow.com/a/56008861/1671505
select right(p_bytes::text, -1)::varbit;
$$ language sql immutable;
create or replace function public.varbit40_to_base32(p_num varbit(40))
returns text as $$
-- https://www.rfc-editor.org/rfc/rfc4648#section-6
select string_agg(('{"A","B","C","D","E","F","G","H"
,"I","J","K","L","M","N","O","P"
,"Q","R","S","T","U","V","W","X"
,"Y","Z","2","3","4","5","6","7"}'::text[])
-- shift right to 5 lsb and mask off the rest
[(p_num >> s & 31::bit(40))::bigint::int + 1]
-- concatenate resulting characters in order from left to right
, '' order by o)
-- generate bit-shifts to move all five-bit positions to the right
from generate_series(35, 0, -5) with ordinality m(s, o)
$$ language sql immutable;
create or replace function public.bytea_to_base32(p_val bytea, p_omit_padding bool = true)
returns text as $$
with v(val, padding, trm) as (
select string_agg(varbit40_to_base32(bytea_to_varbit(substring (padded from b for 5))), '' order by o)
-- https://www.rfc-editor.org/rfc/rfc4648#section-6 (1)-(5):
, ('{"","======","====","===","="}'::text[])[rem + 1]
, ('{0,6,4,3,1}'::int[])[rem + 1]
from (values(length(p_val), length(p_val) % 5
-- pad length to be divisible by 5
, p_val || substring(bytea '\x0000000000' from 1 for (5 - length(p_val) % 5) % 5))
) v(len, rem, padded)
-- rfc 4648 6: breakpoints for breaking the byte string into 40-bit groups
, generate_series(1, len + (5 - rem) % 5, 5) with ordinality c(b,o)
group by p_val, rem
)
-- rfc4648 3.2
select case when p_omit_padding then
case when trm > 0 then left(val, -trm) else val end
else
case when trm > 0 then overlay(val placing padding from length(val)-trm+1) else val end
end
from v
union all
-- https://www.rfc-editor.org/rfc/rfc4648#section-10: empty input returns empty string
select ''
limit 1;
$$ language sql immutable strict;
/*
https://www.rfc-editor.org/rfc/rfc4648#section-10
select inp, expected, bytea_to_base32(inp::bytea, omit_padding)
from (values
('', '', true)
, ('f', 'MY', true)
, ('fo', 'MZXQ', true)
, ('foo', 'MZXW6', true)
, ('foob', 'MZXW6YQ', true)
, ('fooba', 'MZXW6YTB', true)
, ('foobar', 'MZXW6YTBOI', true)
, ('', '', false)
, ('f', 'MY======', false)
, ('fo', 'MZXQ====', false)
, ('foo', 'MZXW6===', false)
, ('foob', 'MZXW6YQ=', false)
, ('fooba', 'MZXW6YTB', false)
, ('foobar', 'MZXW6YTBOI======', false)
) t(inp, expected, omit_padding)
where bytea_to_base32(inp::bytea, omit_padding) <> expected
;
*/
Essentially, you need to break down your input into 5-byte groups that in turn get broken into eight 5-bit strings from left to right.
The easiest way is to generate breakpoint positions with generate_series
and then chunk the byte string with substring(x from b for desired_len)
.
The weird double remainder (5-len()%5)%5)
is to clamp the length of padding correctly (otherwise for reminder 0 you get an extra 5-byte padding, since 5-0 = 5, but you really want 0-length padding for that case).
I find that array literal lookups (the ({'A', ... }::text[])[position_calculation]
part) make the code a bit more obvious, if somewhat more noisy.
To generate base_32_hex
per RFC4648 section 7 you only need to replace the alphabet array in varbit_to_base32
. Or you could have the alphabet as input to varbit40_to_base32
and pass the right alphabet from the caller.
Upvotes: 3
Reputation: 1671
I searched this exact questions a few years back and finally built a pure plpgsql schema with functions to handle this and figured I'd leave it here if anyone comes across it.
credit to this video and RFC https://www.youtube.com/watch?v=Va8FLD-iuTg https://www.rfc-editor.org/rfc/rfc4648
here is the extension https://github.com/pyramation/base32/blob/master/packages/base32/sql/launchql-base32--0.0.1.sql
once you install it, you can do
select base32.encode('foo');
-- MZXW6===
select base32.decode('MZXW6===');
-- foo
Upvotes: 3