Reputation: 2688
Is there any function/stored procedure in PostgreSQL/plpgsql which is same as the javascripts encodeURI?
What does it mean? Javascript have a handy built in function to encode any kind of url:
encodeURI(url) -> returns the encoded url
For example:
encodeURI('http://hu.wikipedia.org/wiki/São_Paulo')
-> returns a String which is "http://hu.wikipedia.org/wiki/S%C3%A3o_Paulo"
I looking for exactly the same.
I don't want to encode each parameters separately. I don't want a function like javascript encodeURIComponent which is not the same. The example above results a different output with
encodeURIComponent('http://hu.wikipedia.org/wiki/São_Paulo')
-> "http%3A%2F%2Fhu.wikipedia.org%2Fwiki%2FS%C3%A3o_Paulo"
It's encode the whole string not just the path part. So this is not what I'm looking for. I need a plpgsql function which results equivalent output to javascript function encodeURI.
Thanks!
Upvotes: 9
Views: 16164
Reputation: 1
This is an update of @tsohr's answer. I have removed the unused variables and updated the FOR loop to be a little bit less verbose by using FOREACH.
By the way, regex_split_to_array
was added in PostgreSQL 8.3 (See here). If you're using a version before 8.3, @tsohr's answer will do just fine for you.
CREATE OR REPLACE FUNCTION urlencode(in_str VARCHAR, OUT _result VARCHAR)
STRICT IMMUTABLE AS $urlencode$
DECLARE
_eachChar varchar;
_hex varchar;
BEGIN
_result = '';
FOREACH _eachChar IN ARRAY regexp_split_to_array(in_str, '') LOOP
IF _eachChar ~ '[0-9a-zA-Z:/@._?#-]+' THEN
_result := _result || _eachChar;
ELSE
_hex := encode(_eachChar::bytea, 'hex');
_eachChar := '';
WHILE LENGTH(_hex) > 0 LOOP
_eachChar := _eachChar || '%' || SUBSTRING(_hex, 1, 2);
_hex := SUBSTRING(_hex, 3, 999);
END LOOP;
_result := _result || upper(_eachChar);
END IF;
END LOOP;
RETURN ;
END;
$urlencode$ LANGUAGE plpgsql;
Upvotes: 0
Reputation: 45910
I wrote PostgreSQL extensions url_encode that solves this issue.
postgres=# select url_encode('http://hu.wikipedia.org/wiki/São_Paulo');
url_encode
───────────────────────────────────────────────────────
http%3A%2F%2Fhu.wikipedia.org%2Fwiki%2FS%C3%A3o_Paulo
or
postgres=# select uri_encode('http://hu.wikipedia.org/wiki/São_Paulo');
uri_encode
---------------------------------------------
http://hu.wikipedia.org/wiki/S%C3%A3o_Paulo
Upvotes: 7
Reputation: 21
PLPython3 offers a straightforward approach.
Install plpython3 if you don't have it already. Example package install for the command line:
apt-get install postgresql-plpython3-9.5
Create the language (if you haven't done it already for other functions):
CREATE LANGUAGE plpython3u;
Simple function after that:
CREATE OR REPLACE FUNCTION encode_uri(input_string text)
RETURNS text
AS $$
import urllib.parse
return urllib.parse.quote_plus(input_string, safe='~@#$&()*!+=:;,.?/\\''')
$$ LANGUAGE plpython3u immutable strict;
Upvotes: 2
Reputation: 996
This is a very old thread, but surprisingly none of the previously posted solutions seem to comply with the relevant RFC 3986. So here are two PostgreSQL functions encode_uri (if you want to encode a complete URI) and encode_uri_component (if you want to encode just a component of the URI, e.g. the key or value of a query parameter) that combines Nick's buggy solution, ElDiabolo's note, and Kev's partial solution from a related thread into a single working SQL-only solution.
How it works: first, explode the string into single characters, then hex-encode each multibyte character and and each disallowed character.
Encode URI Component:
create or replace function encode_uri_component(text) returns text as $$
select string_agg(
case
when bytes > 1 or c !~ '[0-9a-zA-Z_.!~*''()-]+' then
regexp_replace(encode(convert_to(c, 'utf-8')::bytea, 'hex'), '(..)', E'%\\1', 'g')
else
c
end,
''
)
from (
select c, octet_length(c) bytes
from regexp_split_to_table($1, '') c
) q;
$$ language sql immutable strict;
Encode Complete URI:
create or replace function encode_uri(text) returns text as $$
select string_agg(
case
when bytes > 1 or c !~ '[0-9a-zA-Z_.!~*''();,/?:@&=+$#-]+' then
regexp_replace(encode(convert_to(c, 'utf-8')::bytea, 'hex'), '(..)', E'%\\1', 'g')
else
c
end,
''
)
from (
select c, octet_length(c) bytes
from regexp_split_to_table($1, '') c
) q;
$$ language sql immutable strict;
Upvotes: 8
Reputation: 21
The usage of ::bytea in the answers of tsohr and Nick is wrong. Do a SELECT '\'::bytea to see why.
In both cases convert_to(x, 'utf-8') will give the desired result.
Upvotes: 2
Reputation: 2513
Here is "pure SQL" (no plv8, plpython or even plpgsql is needed) implementation supporting multibyte chars (including 3- and 4-byte emoji):
create or replace function urlencode(text) returns text as $$
select
string_agg(
case
when ol>1 or ch !~ '[0-9a-zA-Z:/@._?#-]+'
then regexp_replace(upper(substring(ch::bytea::text, 3)), '(..)', E'%\\1', 'g')
else ch
end,
''
)
from (
select ch, octet_length(ch) as ol
from regexp_split_to_table($1, '') as ch
) as s;
$$ language sql immutable strict;
(source: https://github.com/NikolayS/postgrest-google-translate/pull/8)
Upvotes: 1
Reputation: 915
Today I encountered "Won't deal with 3 (or more) byte sequences." for Korean characters while I've been using @vyegorov 's answer quite a long time, more than a year, need to change it just dumps the bytea hex strings with "%" prefixed.
CREATE OR REPLACE FUNCTION urlencode(in_str text, OUT _result text)
STRICT IMMUTABLE AS $urlencode$
DECLARE
_i int4;
_temp varchar;
_hex varchar;
_ascii int4;
BEGIN
_result = '';
FOR _i IN 1 .. length(in_str) LOOP
_temp := substr(in_str, _i, 1);
IF _temp ~ '[0-9a-zA-Z:/@._?#-]+' THEN
_result := _result || _temp;
ELSE
_hex := encode(_temp::bytea, 'hex');
_temp := '';
WHILE LENGTH(_hex) > 0 LOOP
_temp := _temp || '%' || SUBSTRING(_hex, 1, 2);
_hex := SUBSTRING(_hex, 3, 999);
END LOOP;
_result := _result || upper(_temp);
END IF;
END LOOP;
RETURN ;
END;
$urlencode$ LANGUAGE plpgsql;
example,
SELECT urlencode('a') UNION ALL --> "a"
SELECT urlencode('À') UNION ALL --> "%C3%80"
SELECT urlencode('Ā') UNION ALL --> "%C4%80"
SELECT urlencode('ə') UNION ALL --> "%C9%99"
SELECT urlencode('α') UNION ALL --> "%CE%B1"
SELECT urlencode('가') UNION ALL --> "%EA%B0%80"
SELECT urlencode('上') UNION ALL --> "%E4%B8%8A"
SELECT urlencode('い') --> "%E3%81%84"
Upvotes: 6
Reputation: 48277
With PL/V8...cheating?
create function encode_uri(text) returns text language plv8 strict immutable as $$
return encodeURI($1);
$$;
Upvotes: 7
Reputation: 22905
Slow and inefficient, consider doing C
version of this function:
CREATE OR REPLACE FUNCTION urlencode(in_str text, OUT _result text)
STRICT IMMUTABLE AS $urlencode$
DECLARE
_i int4;
_temp varchar;
_ascii int4;
BEGIN
_result = '';
FOR _i IN 1 .. length(in_str) LOOP
_temp := substr(in_str, _i, 1);
IF _temp ~ '[0-9a-zA-Z:/@._?#-]+' THEN
_result := _result || _temp;
ELSE
_ascii := ascii(_temp);
IF _ascii > x'07ff'::int4 THEN
RAISE EXCEPTION 'Won''t deal with 3 (or more) byte sequences.';
END IF;
IF _ascii <= x'07f'::int4 THEN
_temp := '%'||to_hex(_ascii);
ELSE
_temp := '%'||to_hex((_ascii & x'03f'::int4)+x'80'::int4);
_ascii := _ascii >> 6;
_temp := '%'||to_hex((_ascii & x'01f'::int4)+x'c0'::int4)
||_temp;
END IF;
_result := _result || upper(_temp);
END IF;
END LOOP;
RETURN ;
END;
$urlencode$ LANGUAGE plpgsql;
Results:
# select urlencode('http://hu.wikipedia.org/wiki/São_Paulo');
-[ RECORD 1 ]------------------------------------------
urlencode | http://hu.wikipedia.org/wiki/S%C3%A3o_Paulo
Upvotes: 16