Reputation: 7011

How to urlencode data into a URL, with bash or curl

How can a string be urlencoded and embedded into the URL? Please note that I am not trying to GET or POST data, so the -G and --data and --data-urlencode options of curl don't seem to do the job.

For example, if you used

curl -G http://example.com/foo --data-urlencode "bar=spaced data"

that would be functionally equivalent to

curl http://example.com/foo?bar=spaced%20data"

which is not desired.

I have a string foo/bar which must be urlencoded foo%2fbar and embedded into the URL.

curl http://example.com/api/projects/foo%2fbar/events

One hypothetical solution (if I could find something like this) would be to preprocess the data in bash, if there exists some kind of urlencode function.

DATA=foo/bar
ENCODED=`urlencode $DATA`
curl http://example.com/api/projects/${ENCODED}/events

Another hypothetical solution (if I could find something like this) would be some switch in curl, similar to this:

curl http://example.com/api/projects/{0}/events --string-urlencode "0=foo/bar"

The specific reason I'm looking for an answer to this question is the Gitlab API. For example, gitlab get single project NAMESPACE/PROJECT_NAME is URL-encoded, eg. /api/v3/projects/diaspora%2Fdiaspora (where / is represented by %2F). Further to this, you can request individual properties in the project, so you end up with a URL such as http://example.com/projects/diaspora%2Fdiaspora/events

Although this question is gitlab-specific, I imagine it's generally applicable to REST API's in general, and I'm surprised I can't find a pre-existing answer on stackoverflow or internet search.

Upvotes: 15

Answers (7)

RARE Kpop Manifesto

Reputation: 2811

For larger inputs, a recursive awk function would be barely slower than python3's built-in urllib.parse.quote(), whilejq is by far the slowest :

(in these benchmarks, awk always went first in order to eliminate any possibility it benefits from system caching)

      in0: 39.0MiB 0:00:00 [ 418MiB/s] [ 418MiB/s] [==>] 100%            
     out9: 74.6MiB 0:00:01 [55.7MiB/s] [55.7MiB/s] [<=>]

( pvE 0.1 in0 < "$___" | mawkUx ; )  

1.27s user 0.09s system 99% cpu 1.365 total d083f07bbe4a3a55d14e2b6b2703c25d
 
      in0: 39.0MiB 0:00:00 [1.40GiB/s] [1.40GiB/s] [==>] 100%            
     out9: 74.6MiB 0:00:01 [63.4MiB/s] [63.4MiB/s] [<=>]

( pvE 0.1 in0 < "$___" | python3 -c ; )  

1.12s user 0.07s system 99% cpu 1.202 total d083f07bbe4a3a55d14e2b6b2703c25d
 
      in0: 39.0MiB 0:00:00 [ 317MiB/s] [ 317MiB/s] [==>] 100%            
     out9: 74.6MiB 0:00:02 [30.5MiB/s] [30.5MiB/s] [<=>]

( pvE 0.1 in0 < "$___" | jq -sRr '@uri'; ) 

2.39s user 0.07s system 99% cpu 2.472 total d083f07bbe4a3a55d14e2b6b2703c25d

Once in a blue moonawk is even faster than python's built-in :

      in0:  153MiB 0:00:00 [1.23GiB/s] [1.23GiB/s] [==>] 100%            
     out9:  199MiB 0:00:01 [ 102MiB/s] [ 102MiB/s] [ <=>]

( pvE 0.1 in0 < "$___" | mawkUx ; )  

1.68s user 0.26s system 98% cpu 1.979 total 827f416a5302a6fad2f844a86d9a4c56
 
      in0:  153MiB 0:00:00 [2.38GiB/s] [2.38GiB/s] [==>] 100%            
     out9:  199MiB 0:00:03 [56.6MiB/s] [56.6MiB/s] [ <=> ]

( pvE 0.1 in0 < "$___" | python3 -c ; )  

3.27s user 0.26s system 99% cpu 3.560 total 827f416a5302a6fad2f844a86d9a4c56

      in0:  153MiB 0:00:01 [86.0MiB/s] [86.0MiB/s] [==>] 100%            
     out9:  199MiB 0:00:06 [32.3MiB/s] [32.3MiB/s] [<=> ]

( pvE 0.1 in0 < "$___" | jq -sRr '@uri'; ) 

6.03s user 0.18s system 99% cpu 6.226 total 827f416a5302a6fad2f844a86d9a4c56

calling this function with no arguments at all defaults to url-encoding all of $0

function urlencode_rec(__, _, ___, ____) {

    if (_)
        if (!___ ? _^(_^_^_ - _) < (____ = length(__)) \
                                 : (____ = __ - ___) < _)
            return ___ \
                ? urlencode_rec(__, _,
                   __ += (____ - ____%_) / _) urlencode_rec(++__, _, ___) \
                : urlencode_rec(substr(__, !!_, _ = (____ - ____%_) / _),
                  (__ = substr(__, ++_))^(_ = "") + !_) urlencode_rec((__)_,
                                             (___ = ! (__ = _)) + ___)
        else
            return substr(_, (__ = urlencode((_ = !_ < _) ? __ : \
                   substr($(_++), _ + (_ *= (++_*_*_)^_^_) * --__,
                      (___ - __) * _)))^!_, -gsub(/\+/, "%20", __))__

    else if ((____ = (_ = substr(_, _, _)) + !__) &&
                     (__ == _) * (__ == (_ < _)) < ____)
        return __

    else if ((____ = ____ ? -length() : length(__)) <= (\
                   ___ = (_ += ++_) * (_*_*_)^_^_) && -___ <= ____)

        return substr(_ = !_, _, ____ && ((__ = urlencode(_ < ____ \
                        ? __ : $_))^_ < -gsub(/\+/, "%20", __)))__

    else if (____ < !_ &&
                    __ = ((__ = -____) - (__ %= ___)) / ___ + !!__)
        return \
        (___ = __ <= _) ? urlencode_rec(___, -_, __) \
                        : urlencode_rec(!___, -_, ___ = (__ - __%_) / _) \
                          urlencode_rec(++___, -_, __)
    else
        return urlencode_rec(substr(__, ___ = !!_,
                 ____ = (____ - ____%_) / _), ___ += (__ = substr(__,
               ++____))^(_ = "")) urlencode_rec((__)_, (__ = _) + ___)
}

Upvotes: 0

Adam D.

Reputation: 245

Since the question is how do you urlencode with bash or curl?

function urlencode() {
    sed "s/\x25/%25/g;s/\x20/%20/g;s/\x21/%21/g;s/\x22/%22/g;s/\x23/%23/g;s/\x5c\x24/%24/g;\
        s/\x26/%26/g;s/\x27/%27/g;s/\x28/%28/g;s/\x29/%29/g;s/\x2A/%2A/g;s/\x2B/%2B/g;\
        s/\x2C/%2C/g;s/\x2D/%2D/g;s/\x3A/%3A/g;s/\x3F/%3F/g;s/\x7C/%7C/g;s/\x5c\x5B/%5B/g"
}

You can add another to encode the / separately:

function encode_slash() { s/\x2F/%2F/g; }

Should work for most cases, but if you need to handle Unicode, then you need to convert those separately - you get the functionality of jq -jRr '@uri' but jq is written in C so of course it will be much quicker on large amounts of unicode. This is good for use on occasional unicode chars:

#!/bin/bash
 
## Written by Adam Danischewski 08/04/2024 

declare CURR_ORD 

str="${1:-😄.mp4}"

function ord() {
    printf -v CURR_ORD "%d" "\"$1"
}

function has_unicode() { 
 local input="$1"
 local -i charcnt=$(wc -m <<<"$input")
 local -i bytecnt=$(wc -c <<<"$input")
 ((charcnt!=bytecnt))
 return $?
}

function urlencode() {
    sed "s/\x25/%25/g;s/\x20/%20/g;s/\x21/%21/g;s/\x22/%22/g;s/\x23/%23/g;s/\x5c\x24/%24/g;\
        s/\x26/%26/g;s/\x27/%27/g;s/\x28/%28/g;s/\x29/%29/g;s/\x2A/%2A/g;s/\x2B/%2B/g;\
        s/\x2C/%2C/g;s/\x2D/%2D/g;s/\x3A/%3A/g;s/\x3F/%3F/g;s/\x7C/%7C/g;s/\x5c\x5B/%5B/g"
}

function encode_unicode() { 
for ((i=0;i<${#str};i++)); do
    char=${str:i:1}
    ord "$char"
    if ((${#CURR_ORD}>3)); then 
     od -t x1 <<< "$char" | awk '{$1="";gsub("^[[:space:]]*","");for(i=1;i<NF;i++) printf "%%" toupper($i);}'
    else 
     printf "%s" "$char" 
    fi 
done
}

## Tokenize percents before encoding unicode 
function tokenize_orig_pcts() {
  sed 's/%/\x01/g'
} 

## Tokenize percents after encoding unicode, since this is urlencoded..  
function tokenize_pcts() {
  sed 's/%/\x02/g'
} 

function detokenize_orig_pcts() {
  sed 's/\x01/%/g'
} 

function detokenize_pcts() {
  sed 's/\x02/%/g'
} 

function urlencode() { 
 sed "s/\x25/%25/g;s/\x20/%20/g;s/\x21/%21/g;s/\x22/%22/g;s/\x23/%23/g;s/\x5c\x24/%24/g;\
        s/\x26/%26/g;s/\x27/%27/g;s/\x28/%28/g;s/\x29/%29/g;s/\x2A/%2A/g;s/\x2B/%2B/g;\
        s/\x2C/%2C/g;s/\x3A/%3A/g;s/\x3F/%3F/g;s/\x7C/%7C/g;s/\x5c\x5B/%5B/g"
}

function main() { 
  if has_unicode "$str"; then 
    str=$(tokenize_orig_pcts <<< "$str")
    str=$(encode_unicode)
    str=$(tokenize_pcts <<< "$str")
    str=$(detokenize_orig_pcts <<< "$str")
    str=$(urlencode <<< "$str")
    detokenize_pcts <<< "$str"
  else 
    urlencode <<< "$str"
  fi  
}

main

Upvotes: 0

tlex

Reputation: 31

Expanding on @guilherme-z-santos's answer:

ue() { local in=$1; if [ -z "$in" ]; then read -r in; fi; echo $in|jq -Rr '@uri'; }

The jq param -s will add an unwanted %0A at the end, so it was dropped. Also, for it to be a proper function, it needs a couple of spaces and a ; at the end, before closing.

It can now be used:

$ ue ffoooä
ffooo%C3%A4
$ ue ffooo
ffooo
$ echo foooä|ue
fooo%C3%A4

Upvotes: 1

chronospoon

Reputation: 637

One that supports multiple input lines, building on Julio's answers:

python3 -c "import sys, urllib.parse; print(urllib.parse.quote(sys.stdin.read()))"

Which lets me do this on macOS (copy something to the clipboard, then send it to test an endpoint):

alias urlencode='python3 -c "import sys, urllib.parse; print(urllib.parse.quote(sys.stdin.read()))"'

curl -X 'POST' -H 'accept: application/json' \
    "http://127.0.0.1:11434/generate?content=$(pbpaste | urlencode)"

Upvotes: 1

Julio Batista Silva

Reputation: 2089

One-liner in Python3:

python -c "from urllib.parse import quote; print(quote(input('Type here: ')))"

Upvotes: 1

Guilherme Z. Santos

Reputation: 237

Adding to @rici's comment on the accepted answer (which has more upvotes than the answer itself), we may create the function ue (short for urlencode, but you may call it as you wish) on top of jq and make it read from stdin or from an argument:

ue() {local in=$1; if [ -z "$in" ]; then read in; fi; echo $in|jq -sRr '@uri'}

echo "https://example.com/$(ue "some part")/?date=$(date|ue)"

One-liner, no Python, just jq, plain and simple...

Upvotes: 0

Charles Duffy

Reputation: 295443

The urlencode function you propose is easy enough to implement:

urlencode() {
  python -c 'import urllib, sys; print urllib.quote(sys.argv[1], sys.argv[2])' \
    "$1" "$urlencode_safe"
}

...used as:

data=foo/bar
encoded=$(urlencode "$data")
curl "http://example.com/api/projects/${encoded}/events"

If you want to have some characters which are passed through literally -- in many use cases, this is desired for /s -- instead use:

encoded=$(urlencode_safe='/' urlencode "$data")

Upvotes: 15

How to urlencode data into a URL, with bash or curl

Answers (7)

Related Questions