Roki
Roki

Reputation: 2688

Javascript encodeURI like function in postgresql?

Is there any function/stored procedure in PostgreSQL/plpgsql which is same as the javascripts encodeURI?

What does it mean? Javascript have a handy built in function to encode any kind of url:

encodeURI(url) -> returns the encoded url

For example: encodeURI('http://hu.wikipedia.org/wiki/São_Paulo') -> returns a String which is "http://hu.wikipedia.org/wiki/S%C3%A3o_Paulo"

I looking for exactly the same.

I don't want to encode each parameters separately. I don't want a function like javascript encodeURIComponent which is not the same. The example above results a different output with

encodeURIComponent('http://hu.wikipedia.org/wiki/São_Paulo')

-> "http%3A%2F%2Fhu.wikipedia.org%2Fwiki%2FS%C3%A3o_Paulo"

It's encode the whole string not just the path part. So this is not what I'm looking for. I need a plpgsql function which results equivalent output to javascript function encodeURI.

Thanks!

Upvotes: 9

Views: 16164

Answers (9)

This is an update of @tsohr's answer. I have removed the unused variables and updated the FOR loop to be a little bit less verbose by using FOREACH.

By the way, regex_split_to_array was added in PostgreSQL 8.3 (See here). If you're using a version before 8.3, @tsohr's answer will do just fine for you.

CREATE OR REPLACE FUNCTION urlencode(in_str VARCHAR, OUT _result VARCHAR)
    STRICT IMMUTABLE AS $urlencode$
DECLARE
    _eachChar   varchar;
    _hex        varchar;
BEGIN
    _result = '';
    FOREACH _eachChar IN ARRAY regexp_split_to_array(in_str, '') LOOP
        IF _eachChar ~ '[0-9a-zA-Z:/@._?#-]+' THEN
            _result := _result || _eachChar;
        ELSE
            _hex := encode(_eachChar::bytea, 'hex');
            _eachChar := '';
            WHILE LENGTH(_hex) > 0 LOOP
                _eachChar := _eachChar || '%' || SUBSTRING(_hex, 1, 2);
                _hex := SUBSTRING(_hex, 3, 999);
            END LOOP;
            _result := _result || upper(_eachChar);
        END IF;
    END LOOP;
    RETURN ;
END;
$urlencode$ LANGUAGE plpgsql;

Upvotes: 0

Pavel Stehule
Pavel Stehule

Reputation: 45910

I wrote PostgreSQL extensions url_encode that solves this issue.

postgres=# select url_encode('http://hu.wikipedia.org/wiki/São_Paulo');
                      url_encode                       
───────────────────────────────────────────────────────
http%3A%2F%2Fhu.wikipedia.org%2Fwiki%2FS%C3%A3o_Paulo

or

postgres=# select uri_encode('http://hu.wikipedia.org/wiki/São_Paulo');
               uri_encode                  
---------------------------------------------
http://hu.wikipedia.org/wiki/S%C3%A3o_Paulo

Upvotes: 7

dawgman
dawgman

Reputation: 21

PLPython3 offers a straightforward approach.

Install plpython3 if you don't have it already. Example package install for the command line:

apt-get install postgresql-plpython3-9.5

Create the language (if you haven't done it already for other functions):

CREATE LANGUAGE plpython3u;

Simple function after that:

CREATE OR REPLACE FUNCTION encode_uri(input_string text)
  RETURNS text
AS $$
  import urllib.parse
  return urllib.parse.quote_plus(input_string, safe='~@#$&()*!+=:;,.?/\\''')
$$ LANGUAGE plpython3u immutable strict;

Upvotes: 2

The Coprolal
The Coprolal

Reputation: 996

This is a very old thread, but surprisingly none of the previously posted solutions seem to comply with the relevant RFC 3986. So here are two PostgreSQL functions encode_uri (if you want to encode a complete URI) and encode_uri_component (if you want to encode just a component of the URI, e.g. the key or value of a query parameter) that combines Nick's buggy solution, ElDiabolo's note, and Kev's partial solution from a related thread into a single working SQL-only solution.

How it works: first, explode the string into single characters, then hex-encode each multibyte character and and each disallowed character.

Encode URI Component:

create or replace function encode_uri_component(text) returns text as $$
    select string_agg(
        case
            when bytes > 1 or c !~ '[0-9a-zA-Z_.!~*''()-]+' then 
                regexp_replace(encode(convert_to(c, 'utf-8')::bytea, 'hex'), '(..)', E'%\\1', 'g')
            else 
                c
        end,
        ''
    )
    from (
        select c, octet_length(c) bytes
        from regexp_split_to_table($1, '') c
    ) q;
$$ language sql immutable strict;

Encode Complete URI:

create or replace function encode_uri(text) returns text as $$
    select string_agg(
        case
            when bytes > 1 or c !~ '[0-9a-zA-Z_.!~*''();,/?:@&=+$#-]+' then 
                regexp_replace(encode(convert_to(c, 'utf-8')::bytea, 'hex'), '(..)', E'%\\1', 'g')
            else 
                c
        end,
        ''
    )
    from (
        select c, octet_length(c) bytes
        from regexp_split_to_table($1, '') c
    ) q;
$$ language sql immutable strict;

Upvotes: 8

ElDiabolo
ElDiabolo

Reputation: 21

The usage of ::bytea in the answers of tsohr and Nick is wrong. Do a SELECT '\'::bytea to see why.

In both cases convert_to(x, 'utf-8') will give the desired result.

Upvotes: 2

Nick
Nick

Reputation: 2513

Here is "pure SQL" (no plv8, plpython or even plpgsql is needed) implementation supporting multibyte chars (including 3- and 4-byte emoji):

create or replace function urlencode(text) returns text as $$
  select
    string_agg(
      case
        when ol>1 or ch !~ '[0-9a-zA-Z:/@._?#-]+' 
          then regexp_replace(upper(substring(ch::bytea::text, 3)), '(..)', E'%\\1', 'g')
        else ch
      end,
      ''
    )
  from (
    select ch, octet_length(ch) as ol
    from regexp_split_to_table($1, '') as ch
  ) as s;
$$ language sql immutable strict;

(source: https://github.com/NikolayS/postgrest-google-translate/pull/8)

Upvotes: 1

tsohr
tsohr

Reputation: 915

Today I encountered "Won't deal with 3 (or more) byte sequences." for Korean characters while I've been using @vyegorov 's answer quite a long time, more than a year, need to change it just dumps the bytea hex strings with "%" prefixed.

CREATE OR REPLACE FUNCTION urlencode(in_str text, OUT _result text)
    STRICT IMMUTABLE AS $urlencode$
DECLARE
    _i      int4;
    _temp   varchar;
    _hex    varchar;
    _ascii  int4;
BEGIN
    _result = '';
    FOR _i IN 1 .. length(in_str) LOOP
        _temp := substr(in_str, _i, 1);
        IF _temp ~ '[0-9a-zA-Z:/@._?#-]+' THEN
            _result := _result || _temp;
        ELSE
            _hex := encode(_temp::bytea, 'hex');
            _temp := '';
            WHILE LENGTH(_hex) > 0 LOOP
                _temp := _temp || '%' || SUBSTRING(_hex, 1, 2);
                _hex := SUBSTRING(_hex, 3, 999);
            END LOOP;
            _result := _result || upper(_temp);
        END IF;
    END LOOP;
    RETURN ;
END;
$urlencode$ LANGUAGE plpgsql;

example,

SELECT urlencode('a') UNION ALL  --> "a"
SELECT urlencode('À') UNION ALL  --> "%C3%80"
SELECT urlencode('Ā') UNION ALL  --> "%C4%80"
SELECT urlencode('ə') UNION ALL  --> "%C9%99"
SELECT urlencode('α') UNION ALL  --> "%CE%B1"
SELECT urlencode('가') UNION ALL --> "%EA%B0%80"
SELECT urlencode('上') UNION ALL --> "%E4%B8%8A"
SELECT urlencode('い')           --> "%E3%81%84"

Upvotes: 6

Neil McGuigan
Neil McGuigan

Reputation: 48277

With PL/V8...cheating?

create function encode_uri(text) returns text language plv8 strict immutable as $$
  return encodeURI($1);
$$;

Upvotes: 7

vyegorov
vyegorov

Reputation: 22905

Slow and inefficient, consider doing C version of this function:

CREATE OR REPLACE FUNCTION urlencode(in_str text, OUT _result text)
    STRICT IMMUTABLE AS $urlencode$
DECLARE
    _i      int4;
    _temp   varchar;
    _ascii  int4;
BEGIN
    _result = '';
    FOR _i IN 1 .. length(in_str) LOOP
        _temp := substr(in_str, _i, 1);
        IF _temp ~ '[0-9a-zA-Z:/@._?#-]+' THEN
            _result := _result || _temp;
        ELSE
            _ascii := ascii(_temp);
            IF _ascii > x'07ff'::int4 THEN
                RAISE EXCEPTION 'Won''t deal with 3 (or more) byte sequences.';
            END IF;
            IF _ascii <= x'07f'::int4 THEN
                _temp := '%'||to_hex(_ascii);
            ELSE
                _temp := '%'||to_hex((_ascii & x'03f'::int4)+x'80'::int4);
                _ascii := _ascii >> 6;
                _temp := '%'||to_hex((_ascii & x'01f'::int4)+x'c0'::int4)
                            ||_temp;
            END IF;
            _result := _result || upper(_temp);
        END IF;
    END LOOP;
    RETURN ;
END;
$urlencode$ LANGUAGE plpgsql;

Results:

# select urlencode('http://hu.wikipedia.org/wiki/São_Paulo');
-[ RECORD 1 ]------------------------------------------
urlencode | http://hu.wikipedia.org/wiki/S%C3%A3o_Paulo

Upvotes: 16

Related Questions