Reputation: 7011
How can a string be urlencoded and embedded into the URL? Please note that I am not trying to GET or POST data, so the -G
and --data
and --data-urlencode
options of curl don't seem to do the job.
For example, if you used
curl -G http://example.com/foo --data-urlencode "bar=spaced data"
that would be functionally equivalent to
curl http://example.com/foo?bar=spaced%20data"
which is not desired.
I have a string foo/bar
which must be urlencoded foo%2fbar
and embedded into the URL.
curl http://example.com/api/projects/foo%2fbar/events
One hypothetical solution (if I could find something like this) would be to preprocess the data in bash, if there exists some kind of urlencode
function.
DATA=foo/bar
ENCODED=`urlencode $DATA`
curl http://example.com/api/projects/${ENCODED}/events
Another hypothetical solution (if I could find something like this) would be some switch in curl, similar to this:
curl http://example.com/api/projects/{0}/events --string-urlencode "0=foo/bar"
The specific reason I'm looking for an answer to this question is the Gitlab API. For example, gitlab get single project NAMESPACE/PROJECT_NAME
is URL-encoded, eg. /api/v3/projects/diaspora%2Fdiaspora
(where /
is represented by %2F
). Further to this, you can request individual properties in the project, so you end up with a URL such as http://example.com/projects/diaspora%2Fdiaspora/events
Although this question is gitlab-specific, I imagine it's generally applicable to REST API's in general, and I'm surprised I can't find a pre-existing answer on stackoverflow or internet search.
Upvotes: 15
Views: 33994
Reputation: 2811
For larger inputs, a recursive awk
function would be barely slower than python3
's built-in urllib.parse.quote()
, whilejq
is by far the slowest :
(in these benchmarks, awk
always went first in order to eliminate any possibility it benefits from system caching)
in0: 39.0MiB 0:00:00 [ 418MiB/s] [ 418MiB/s] [==>] 100%
out9: 74.6MiB 0:00:01 [55.7MiB/s] [55.7MiB/s] [<=>]
( pvE 0.1 in0 < "$___" | mawkUx ; )
1.27s user 0.09s system 99% cpu 1.365 total d083f07bbe4a3a55d14e2b6b2703c25d
in0: 39.0MiB 0:00:00 [1.40GiB/s] [1.40GiB/s] [==>] 100%
out9: 74.6MiB 0:00:01 [63.4MiB/s] [63.4MiB/s] [<=>]
( pvE 0.1 in0 < "$___" | python3 -c ; )
1.12s user 0.07s system 99% cpu 1.202 total d083f07bbe4a3a55d14e2b6b2703c25d
in0: 39.0MiB 0:00:00 [ 317MiB/s] [ 317MiB/s] [==>] 100%
out9: 74.6MiB 0:00:02 [30.5MiB/s] [30.5MiB/s] [<=>]
( pvE 0.1 in0 < "$___" | jq -sRr '@uri'; )
2.39s user 0.07s system 99% cpu 2.472 total d083f07bbe4a3a55d14e2b6b2703c25d
Once in a blue moonawk
is even faster than python
's built-in :
in0: 153MiB 0:00:00 [1.23GiB/s] [1.23GiB/s] [==>] 100%
out9: 199MiB 0:00:01 [ 102MiB/s] [ 102MiB/s] [ <=>]
( pvE 0.1 in0 < "$___" | mawkUx ; )
1.68s user 0.26s system 98% cpu 1.979 total 827f416a5302a6fad2f844a86d9a4c56
in0: 153MiB 0:00:00 [2.38GiB/s] [2.38GiB/s] [==>] 100%
out9: 199MiB 0:00:03 [56.6MiB/s] [56.6MiB/s] [ <=> ]
( pvE 0.1 in0 < "$___" | python3 -c ; )
3.27s user 0.26s system 99% cpu 3.560 total 827f416a5302a6fad2f844a86d9a4c56
in0: 153MiB 0:00:01 [86.0MiB/s] [86.0MiB/s] [==>] 100%
out9: 199MiB 0:00:06 [32.3MiB/s] [32.3MiB/s] [<=> ]
( pvE 0.1 in0 < "$___" | jq -sRr '@uri'; )
6.03s user 0.18s system 99% cpu 6.226 total 827f416a5302a6fad2f844a86d9a4c56
calling this function with no arguments at all defaults to url-encoding all of
$0
function urlencode_rec(__, _, ___, ____) {
if (_)
if (!___ ? _^(_^_^_ - _) < (____ = length(__)) \
: (____ = __ - ___) < _)
return ___ \
? urlencode_rec(__, _,
__ += (____ - ____%_) / _) urlencode_rec(++__, _, ___) \
: urlencode_rec(substr(__, !!_, _ = (____ - ____%_) / _),
(__ = substr(__, ++_))^(_ = "") + !_) urlencode_rec((__)_,
(___ = ! (__ = _)) + ___)
else
return substr(_, (__ = urlencode((_ = !_ < _) ? __ : \
substr($(_++), _ + (_ *= (++_*_*_)^_^_) * --__,
(___ - __) * _)))^!_, -gsub(/\+/, "%20", __))__
else if ((____ = (_ = substr(_, _, _)) + !__) &&
(__ == _) * (__ == (_ < _)) < ____)
return __
else if ((____ = ____ ? -length() : length(__)) <= (\
___ = (_ += ++_) * (_*_*_)^_^_) && -___ <= ____)
return substr(_ = !_, _, ____ && ((__ = urlencode(_ < ____ \
? __ : $_))^_ < -gsub(/\+/, "%20", __)))__
else if (____ < !_ &&
__ = ((__ = -____) - (__ %= ___)) / ___ + !!__)
return \
(___ = __ <= _) ? urlencode_rec(___, -_, __) \
: urlencode_rec(!___, -_, ___ = (__ - __%_) / _) \
urlencode_rec(++___, -_, __)
else
return urlencode_rec(substr(__, ___ = !!_,
____ = (____ - ____%_) / _), ___ += (__ = substr(__,
++____))^(_ = "")) urlencode_rec((__)_, (__ = _) + ___)
}
Upvotes: 0
Reputation: 245
Since the question is how do you urlencode with bash or curl?
function urlencode() {
sed "s/\x25/%25/g;s/\x20/%20/g;s/\x21/%21/g;s/\x22/%22/g;s/\x23/%23/g;s/\x5c\x24/%24/g;\
s/\x26/%26/g;s/\x27/%27/g;s/\x28/%28/g;s/\x29/%29/g;s/\x2A/%2A/g;s/\x2B/%2B/g;\
s/\x2C/%2C/g;s/\x2D/%2D/g;s/\x3A/%3A/g;s/\x3F/%3F/g;s/\x7C/%7C/g;s/\x5c\x5B/%5B/g"
}
You can add another to encode the /
separately:
function encode_slash() { s/\x2F/%2F/g; }
Should work for most cases, but if you need to handle Unicode, then you need to convert those separately - you get the functionality of jq -jRr '@uri'
but jq is written in C so of course it will be much quicker on large amounts of unicode. This is good for use on occasional unicode chars:
#!/bin/bash
## Written by Adam Danischewski 08/04/2024
declare CURR_ORD
str="${1:-š.mp4}"
function ord() {
printf -v CURR_ORD "%d" "\"$1"
}
function has_unicode() {
local input="$1"
local -i charcnt=$(wc -m <<<"$input")
local -i bytecnt=$(wc -c <<<"$input")
((charcnt!=bytecnt))
return $?
}
function urlencode() {
sed "s/\x25/%25/g;s/\x20/%20/g;s/\x21/%21/g;s/\x22/%22/g;s/\x23/%23/g;s/\x5c\x24/%24/g;\
s/\x26/%26/g;s/\x27/%27/g;s/\x28/%28/g;s/\x29/%29/g;s/\x2A/%2A/g;s/\x2B/%2B/g;\
s/\x2C/%2C/g;s/\x2D/%2D/g;s/\x3A/%3A/g;s/\x3F/%3F/g;s/\x7C/%7C/g;s/\x5c\x5B/%5B/g"
}
function encode_unicode() {
for ((i=0;i<${#str};i++)); do
char=${str:i:1}
ord "$char"
if ((${#CURR_ORD}>3)); then
od -t x1 <<< "$char" | awk '{$1="";gsub("^[[:space:]]*","");for(i=1;i<NF;i++) printf "%%" toupper($i);}'
else
printf "%s" "$char"
fi
done
}
## Tokenize percents before encoding unicode
function tokenize_orig_pcts() {
sed 's/%/\x01/g'
}
## Tokenize percents after encoding unicode, since this is urlencoded..
function tokenize_pcts() {
sed 's/%/\x02/g'
}
function detokenize_orig_pcts() {
sed 's/\x01/%/g'
}
function detokenize_pcts() {
sed 's/\x02/%/g'
}
function urlencode() {
sed "s/\x25/%25/g;s/\x20/%20/g;s/\x21/%21/g;s/\x22/%22/g;s/\x23/%23/g;s/\x5c\x24/%24/g;\
s/\x26/%26/g;s/\x27/%27/g;s/\x28/%28/g;s/\x29/%29/g;s/\x2A/%2A/g;s/\x2B/%2B/g;\
s/\x2C/%2C/g;s/\x3A/%3A/g;s/\x3F/%3F/g;s/\x7C/%7C/g;s/\x5c\x5B/%5B/g"
}
function main() {
if has_unicode "$str"; then
str=$(tokenize_orig_pcts <<< "$str")
str=$(encode_unicode)
str=$(tokenize_pcts <<< "$str")
str=$(detokenize_orig_pcts <<< "$str")
str=$(urlencode <<< "$str")
detokenize_pcts <<< "$str"
else
urlencode <<< "$str"
fi
}
main
Upvotes: 0
Reputation: 31
Expanding on @guilherme-z-santos's answer:
ue() { local in=$1; if [ -z "$in" ]; then read -r in; fi; echo $in|jq -Rr '@uri'; }
The jq
param -s
will add an unwanted %0A
at the end, so it was dropped. Also, for it to be a proper function, it needs a couple of spaces and a ;
at the end, before closing.
It can now be used:
$ ue ffoooƤ
ffooo%C3%A4
$ ue ffooo
ffooo
$ echo foooƤ|ue
fooo%C3%A4
Upvotes: 1
Reputation: 637
One that supports multiple input lines, building on Julio's answers:
python3 -c "import sys, urllib.parse; print(urllib.parse.quote(sys.stdin.read()))"
Which lets me do this on macOS (copy something to the clipboard, then send it to test an endpoint):
alias urlencode='python3 -c "import sys, urllib.parse; print(urllib.parse.quote(sys.stdin.read()))"'
curl -X 'POST' -H 'accept: application/json' \
"http://127.0.0.1:11434/generate?content=$(pbpaste | urlencode)"
Upvotes: 1
Reputation: 2089
One-liner in Python3:
python -c "from urllib.parse import quote; print(quote(input('Type here: ')))"
Upvotes: 1
Reputation: 237
Adding to @rici's comment
on the accepted answer (which has more upvotes than the answer itself), we may create the function ue
(short for urlencode
, but you may call it as you wish) on top of jq
and make it read from stdin or from an argument:
ue() {local in=$1; if [ -z "$in" ]; then read in; fi; echo $in|jq -sRr '@uri'}
echo "https://example.com/$(ue "some part")/?date=$(date|ue)"
One-liner, no Python, just jq
, plain and simple...
Upvotes: 0
Reputation: 295443
The urlencode
function you propose is easy enough to implement:
urlencode() {
python -c 'import urllib, sys; print urllib.quote(sys.argv[1], sys.argv[2])' \
"$1" "$urlencode_safe"
}
...used as:
data=foo/bar
encoded=$(urlencode "$data")
curl "http://example.com/api/projects/${encoded}/events"
If you want to have some characters which are passed through literally -- in many use cases, this is desired for /
s -- instead use:
encoded=$(urlencode_safe='/' urlencode "$data")
Upvotes: 15