b0gusb
b0gusb

Reputation: 4721

How to encode a byte array to base32 in PostgreSQL

I have a Java application that generates ids for primary keys by encoding cryptographically random bytes to base32 (RFC4648).

How do I do the same with PostgreSQL from an SQL script?

The gen_random_bytes seems to do the job for generating the random bytes but it seems there isn't anything available to encode them to base32.

Upvotes: 3

Views: 3061

Answers (2)

Mr. Curious
Mr. Curious

Reputation: 927

Was curious if it was possible to convert bytea of any length to base32. Here's sql code with a couple of helper functions to ease the understanding. It appears all used built-in functions are marked immutable so the helpers should inline easily.

Tested on Postgres 14:

create or replace function public.bytea_to_varbit(p_bytes bytea)
returns varbit as $$
-- https://stackoverflow.com/a/56008861/1671505
select right(p_bytes::text, -1)::varbit;
$$ language sql immutable;

create or replace function public.varbit40_to_base32(p_num varbit(40))
returns text as $$
                    -- https://www.rfc-editor.org/rfc/rfc4648#section-6
    select string_agg(('{"A","B","C","D","E","F","G","H"
                        ,"I","J","K","L","M","N","O","P"
                        ,"Q","R","S","T","U","V","W","X"
                        ,"Y","Z","2","3","4","5","6","7"}'::text[])
                     -- shift right to 5 lsb and mask off the rest
                        [(p_num >> s & 31::bit(40))::bigint::int + 1]
                    -- concatenate resulting characters in order from left to right
                        , '' order by o)
        -- generate bit-shifts to move all five-bit positions to the right
      from generate_series(35, 0, -5) with ordinality m(s, o)
$$ language sql immutable;

create or replace function public.bytea_to_base32(p_val bytea, p_omit_padding bool = true)
returns text as $$
with v(val, padding, trm) as (
    select string_agg(varbit40_to_base32(bytea_to_varbit(substring (padded from b for 5))), '' order by o)
         -- https://www.rfc-editor.org/rfc/rfc4648#section-6 (1)-(5):
         , ('{"","======","====","===","="}'::text[])[rem + 1]
         , ('{0,6,4,3,1}'::int[])[rem + 1]
      from (values(length(p_val), length(p_val) % 5 
        -- pad length to be divisible by 5
         , p_val || substring(bytea '\x0000000000' from 1 for (5 - length(p_val) % 5) % 5)) 
           ) v(len, rem, padded)
        -- rfc 4648 6: breakpoints for breaking the byte string into 40-bit groups
         , generate_series(1, len + (5 - rem) % 5, 5) with ordinality c(b,o)
     group by p_val, rem
)                
-- rfc4648 3.2
select case when p_omit_padding then
            case when trm > 0 then left(val, -trm) else val end
       else
            case when trm > 0 then overlay(val placing padding from length(val)-trm+1) else val end
       end
       from v

       union all
       -- https://www.rfc-editor.org/rfc/rfc4648#section-10: empty input returns empty string
       select ''
 limit 1;
$$ language sql immutable strict;

/*
https://www.rfc-editor.org/rfc/rfc4648#section-10
select inp, expected, bytea_to_base32(inp::bytea, omit_padding)
  from (values
         ('', '', true)
       , ('f', 'MY', true)
       , ('fo', 'MZXQ', true)
       , ('foo', 'MZXW6', true)
       , ('foob', 'MZXW6YQ', true)
       , ('fooba', 'MZXW6YTB', true)
       , ('foobar', 'MZXW6YTBOI', true)
       , ('', '', false)
       , ('f', 'MY======', false)
       , ('fo', 'MZXQ====', false)
       , ('foo', 'MZXW6===', false)
       , ('foob', 'MZXW6YQ=', false)
       , ('fooba', 'MZXW6YTB', false)
       , ('foobar', 'MZXW6YTBOI======', false)
       ) t(inp, expected, omit_padding)
  where bytea_to_base32(inp::bytea, omit_padding) <> expected
;
*/

Essentially, you need to break down your input into 5-byte groups that in turn get broken into eight 5-bit strings from left to right.

The easiest way is to generate breakpoint positions with generate_series and then chunk the byte string with substring(x from b for desired_len).

The weird double remainder (5-len()%5)%5) is to clamp the length of padding correctly (otherwise for reminder 0 you get an extra 5-byte padding, since 5-0 = 5, but you really want 0-length padding for that case).

I find that array literal lookups (the ({'A', ... }::text[])[position_calculation] part) make the code a bit more obvious, if somewhat more noisy.

To generate base_32_hex per RFC4648 section 7 you only need to replace the alphabet array in varbit_to_base32. Or you could have the alphabet as input to varbit40_to_base32 and pass the right alphabet from the caller.

Upvotes: 3

pyramation
pyramation

Reputation: 1671

I searched this exact questions a few years back and finally built a pure plpgsql schema with functions to handle this and figured I'd leave it here if anyone comes across it.

credit to this video and RFC https://www.youtube.com/watch?v=Va8FLD-iuTg https://www.rfc-editor.org/rfc/rfc4648

here is the extension https://github.com/pyramation/base32/blob/master/packages/base32/sql/launchql-base32--0.0.1.sql

once you install it, you can do

select base32.encode('foo');
-- MZXW6===


select base32.decode('MZXW6===');
-- foo

Upvotes: 3

Related Questions