Jecki
Jecki

Reputation: 802

encode urls UTF8 - shells script

i'm using the following function to encode the url /titles in my bash script

urlencode() {
    # urlencode <string>
    old_lc_collate=$LC_COLLATE
    LC_COLLATE=C

    local length="${#1}"
    for (( i = 0; i < length; i++ )); do
        local c="${1:i:1}"
        case $c in
            [a-zA-Z0-9.~_-]) printf "$c" ;;
            *) printf '%%%02X' "'$c" ;;
        esac
    done

    LC_COLLATE=$old_lc_collate
}

the out put for for some params as following :

description=%627%644%639%628%627%62F%64A
downloadurl=http%3A%2F%2Fmedia.myhomepage.com%2Fmedia%2FVT-142437WE-WEB-IRQ-MOSUL-PROGRESS-HAWAMDA_2017-03-08_14%3A28%3A12.mp4
title=%623%647%644GKxS7otlAsujiRxXHTvshUE9

also using the following java code to encode the same param

URLEncoder.encode(video.getHeadline() , UTF_8_ENCODING).replace("+", "%20");
URLEncoder.encode(video.getHeadline() , UTF_8_ENCODING)

and the out is different than the bash

description=%D8%A7%D9%82%D8%AA%D8%AD%D9%85%D8%AA%20%D8%A7%D9%84%D9%82%D9%88%D8%A7%D8%AA%20%D8%A7%D9%84%D8%B9%D8%B1%D8%A7%D9%82%D9%8A%D8%A9%20%D8%AD%D9%8A%20%D8%A7%D9%84%D9%85%D9%86%D8%B5%D9%88%D8%B1%20%D8%BA%D8%B1%D8%A8%20%D8%A7%D9%84%D9%85%D9%88%D8%B5%D9%84%20%D8%B6%D9%85%D9%86%20%D8%AA%D9%82%D8%AF%D9%85%D9%87%D8%A7%20%D9%81%D9%8A%20%D8%A7%D9%84%D8%B3%D8%A7%D8%AD%D9%84%20%D8%A7%D9%84%D8%BA%D8%B1%D8%A8%D9%8A%20%D9%85%D9%86%20%D8%A7%D9%84%D9%85%D8%AF%D9%8A%D9%86%D8%A9%20%D8%AA%D9%85%D9%87%D9%8A%D8%AF%D8%A7%20%D9%84%D8%A7%D8%B3%D8%AA%D8%B9%D8%A7%D8%AF%D8%AA%D9%87%D8%A7%20%D9%85%D9%86%20%D8%AF%D8%A7%D8%B9%D8%B4.%20%D9%85%D9%86%20%D8%AC%D9%87%D8%A9%20%D8%A3%D8%AE%D8%B1%D9%89%20%D8%AE%D9%8A%D9%91%D8%B1%20%D8%B1%D8%A6%D9%8A%D8%B3%20%D8%A7%D9%84%D9%88%D8%B2%D8%B1%D8%A7%D8%A1%20%D8%A7%D9%84%D8%B9%D8%B1%D8%A7%D9%82%D9%8A%20%D8%AD%D9%8A%D8%AF%D8%B1%20%D8%A7%D9%84%D8%B9%D8%A8%D8%A7%D8%AF%D9%8A%20%D9%85%D8%B3%D9%84%D8%AD%D9%8A%20%D8%A7%D9%84%D8%AA%D9%86%D8%B8%D9%8A%D9%85%20%D8%A8%D9%8A%D9%86%20%D8%A7%D9%84%D8%A7%D8%B3%D8%AA%D8%B3%D9%84%D8%A7%D9%85%20%D9%88%D8%A7%D9%84%D9%82%D8%AA%D9%84.


downloadurl=http%3A%2F%2FFmedia.myhomepage.com%2Fmedia%2Fvideos%2F2017%2F03%2F08%2FVT-142437WE-WEB-IRQ-MOSUL-PROGRESS-HAWAMDA_2017-03-08_14%3A28%3A12.mp4


title=%D8%A7%D9%84%D8%B9%D8%A8%D8%A7%D8%AF%D9%8A%20%D9%8A%D8%AE%D9%8A%D8%B1%20%D9%85%D8%B3%D9%84%D8%AD%D9%8A%20%D8%AF%D8%A7%D8%B9%D8%B4%20%D8%A8%D9%8A%D9%86%20%D8%A7%D9%84%D8%A7%D8%B3%D8%AA%D8%B3%D9%84%D8%A7%D9%85%20%D9%88%D8%A7%D9%84%D9%82%D8%AA%D9%84

please advise how i can achgive the same output of java in bash like what is counterpart of java.net.URLEncoder.encode() in bash shell

Upvotes: 1

Views: 267

Answers (2)

randomir
randomir

Reputation: 18697

If you need to url-encode data only to later pass it to curl (as you mention in comments), I would recommend letting curl take care of encoding for you with the --data-urlencode <data> option.

For example:

title="Mačka"
url="http://google.com/?q=mačka"
curl -G example.com/?foo=bar --data-urlencode "title=$title" --data-urlencode "url=$url"

Makes the request like:

GET /?foo=bar&title=Ma%C4%8Dka&url=http%3A%2F%2Fgoogle.com%2F%3Fq%3Dma%C4%8Dka

Notice the use of -G to force the GET method; without it, any of the --data-* options defaults to the POST method and parameters in body.

Upvotes: 1

guido
guido

Reputation: 19194

This oneliner relies on xxd to get the hexdump of the string, then prepends the escapes:

string="العبادي يخير مسلحي داعش بين الاستسلام والقتل"
echo $string | xxd -g 1 | cut -d' ' -f2-17 | sed 's/\([0-9a-f][0-9a-f]\) /%\1/g' | sed  -e ':a' -e 'N' -e '$!ba' -e 's/\n//g'

Result:

%d8%a7%d9%84%d8%b9%d8%a8%d8%a7%d8%af%d9%8a%20d9%8a%d8%ae%d9%8a%d8%b1%20%d9%85%d8%b3%d9%84%d8ad%d9%8a%20%d8%af%d8%a7%d8%b9%d8%b4%20%d8%a8%d98a%d9%86%20%d8%a7%d9%84%d8%a7%d8%b3%d8%aa%d8%b3d9%84%d8%a7%d9%85%20%d9%88%d8%a7%d9%84%d9%82%d8aa%d9%84%0a

Upvotes: 0

Related Questions