Shushiro
Shushiro

Reputation: 582

Linux Bash: cURL - how to pass variables to the URL

I want to do cURL GET-request. The following URL should be used:

https://iant.toulouse.inra.fr/bacteria/annotation/cgi/rhime.cgi' -H 'Host: iant.toulouse.inra.fr' -H 'User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' -H 'Accept-Language: de,en-US;q=0.7,en;q=0.3' --compressed -H 'Referer: https://iant.toulouse.inra.fr/bacteria/annotation/cgi/rhime.cgi?__wb_cookie=&__wb_cookie_name=auth.rhime&__wb_cookie_path=/bacteria/annotation/cgi&__wb_session=WB84Qfsf&__wb_main_menu=Genome&__wb_function=$parent' -H 'Content-Type: application/x-www-form-urlencoded' -H 'Connection: keep-alive' -H 'Upgrade-Insecure-Requests: 1' -H 'Pragma: no-cache' -H 'Cache-Control: no-cache' --data '__wb_function=PortalExtractSeq&mode=run&species=rhime&fastafile=%2Fwww%2Fbacteria%2Fannotation%2F%2Fsite%2Fprj%2Frhime%2F%2Fdb%2F$ab.genomic&begin=$start&end=$end&strand=$strand

At the end of the URL, I have some words, which I want to design as variables, so depending on the input, the URL is different and I then request another resource.

The end of the URL. $ab, $start, $end and $strand are the variables, all of them are Strings.

...2Frhime%2F%2Fdb%2F$ab.genomic&begin=$start&end=$end&strand=$strand

I came across "urlencode" and I though of storing my URL as one big String in a variable and pass it to URL encode, but I am not sure, how to do it.

I tried this/I am searching for something like this:

#!bin/bash
[...]
cURL="https://iant.toulouse.inra.fr/bacteria/annotation/cgi/rhime.cgi' -H 'Host: iant.toulouse.inra.fr' -H 'User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' -H 'Accept-Language: de,en-US;q=0.7,en;q=0.3' --compressed -H 'Referer: https://iant.toulouse.inra.fr/bacteria/annotation/cgi/rhime.cgi?__wb_cookie=&__wb_cookie_name=auth.rhime&__wb_cookie_path=/bacteria/annotation/cgi&__wb_session=WB84Qfsf&__wb_main_menu=Genome&__wb_function=$parent' -H 'Content-Type: application/x-www-form-urlencoded' -H 'Connection: keep-alive' -H 'Upgrade-Insecure-Requests: 1' -H 'Pragma: no-cache' -H 'Cache-Control: no-cache' --data '__wb_function=PortalExtractSeq&mode=run&species=rhime&fastafile=%2Fwww%2Fbacteria%2Fannotation%2F%2Fsite%2Fprj%2Frhime%2F%2Fdb%2F$ab.genomic&begin=$start&end=$end&strand=$strand"

# storing HTTP response code in variable response. Only if the
# reponse code is OK (200), we move on
  response=$(curl -X HEAD -I --header 'Accept:txt/html' "https://iant.toulouse.inra.fr/bacteria/annotation/cgi/rhime.cgi?__wb_cookie=&__wb_cookie_name=auth.rhime&__wb_cookie_path=/bacteria/annotation/cgi&__wb_session=WB8jqwTM&__wb_main_menu=Genome&__wb_function="$location""|head -n1|awk '{print $2}')

  echo "$response"

# getting information via curl request
  if [ $response = 200 ] ; then
    info=$(curl -G "$ (urlencode "$cURL")")
  fi

  echo $info

For my response-code checkup, the method of directly passing $location seems to work, but with more variables, I get an error (response code 100, whereas I get 200 with the code-checkup)

Do I have a general error in understanding curl/urlencode? What did I miss?

Thanks for you time and effort in advance :)

UPDATE

#!/bin/sh
# handling command-line input
file=$1
ecf=$2


# iterating through file and pulling out
# information for the GET- and POST-request

while read -r line
  do
    parent=$(echo $line | awk '{print substr($1,2,3)}')
    start=$(echo $line | awk '{print substr($2,2,6)}')
    end=$(echo $line | awk '{print substr($3,2,6)}')
    strand=$(echo $line | awk '{print substr($4,2,1)}')
    locus=$(echo $line | awk '{print substr($6,2,8)}')

# depending on $parent, the right insertion for the URL is generated
    if [ $parent = "SMc" ] ; then
      location="Genome"
      ab="SMc"
    elif [ $parent = "SMa" ] ; then
      location="PrintPsyma"
      ab="pSymA"
    else [ $parent = "SMb" ]
      location="PrintPsymb"
      ab="pSymB"
    fi
# building variables for curl content request


  options=( --compressed)

  headers=(
    -H 'Host: iant.toulouse.inra.fr'
    -H 'User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0'
    -H 'Accept: txt/html,application/xhtml+xml,application/xml;1=0.9,*/*;q=0.8'
    -H 'Accept-Language: de,en-US;q=0.7,en;q=0.3'
    -H 'Referer: https://iant.toulouse.inra.fr/bacteria/annotation/cgi/rhime.cgi?__wb_cookie=&__wb_cookie_name=auth.rhime&__wb_cookie_path=/bacteria/annotation/cgi&__wb_session=WB84Qfsf&__wb_main_menu=Genome&__wb_function=$parent'
    -H 'Content-Type: application/x-www-form-urlencoded'
    -H 'Connection: keep-alive'
    -H 'Upgrade-Insecure-Requests: 1'
    -H 'Pragma: no-cache'
    -H 'Cache-Control: no-cache'
  )

    url='https://iant.toulouse.inra.fr/bacteria/annotation/cgi/rhime.cgi'

    ab=$(urlencode "${ab}")
    start=$(urlencode "${start}")
    end=$(urlencode "${end}")
    strand=$(urlencode "${strand}")
    data="__wb_function=PortalExtractSeq&mode=run&species=rhime&fastafile=%2Fwww%2Fbacteria%2Fannotation%2F%2Fsite%2Fprj%2Frhime%2F%2Fdb%2F$ab.genomic&begin=$start&end=$end&strand=$strand"




# storing HTTP response code in variable response. Only if the
# reponse code is OK (200), we move on
    response=$(curl -X HEAD -I --header 'Accept:txt/html' "https://iant.toulouse.inra.fr/bacteria/annotation/cgi/rhime.cgi?__wb_cookie=&__wb_cookie_name=auth.rhime&__wb_cookie_path=/bacteria/annotation/cgi&__wb_session=WB8jqwTM&__wb_main_menu=Genome&__wb_function="$location""|head -n1|awk '{print $2}')

    echo "$response"

# getting information via curl request
    if [ $response = 200 ] ; then
        info=$(curl -G "${options[@]}" "${headers[@]}" --data "${data}" "${url}")
    fi

    echo $info

done < $file

Upvotes: 3

Views: 12501

Answers (1)

Yoory N.
Yoory N.

Reputation: 5464

You need to separate concepts. That string that you put in cURL variable is not a URL, it is URL + set of headers + parameters + one option for compression. They all are different things.

Define them separately like this:

url='https://iant.toulouse.inra.fr/bacteria/annotation/cgi/rhime.cgi'
headers=(
    -H 'Host: iant.toulouse.inra.fr'
    -H 'User-Agent: ...'
    -H 'Accept: ...'
    -H 'Accept-Language: ...'
    ... other headers from your example ...
)
options=(
    --compressed
)
data="__wb_function=PortalExtractSeq&mode=run&species=rhime&fastafile=%2Fwww%2Fbacteria%2Fannotation%2F%2Fsite%2Fprj%2Frhime%2F%2Fdb%2F$ab.genomic&begin=$start&end=$end&strand=$strand"

And then run curl in this fashion:

curl -G "${options[@]}" "${headers[@]}" --data "${data}" "${url}"

This will expand to correct curl command.

About urlencode part: You need encode each of $ab, $start, $end and $strand separately. If you insert them in the string and then encode, then all special characters in that string like & and = will be encoded too, and those already encoded ones like %2F in your example will be encoded twice (will become %252F).

To keep the code tidy, you can encode them beforehand:

ab=$(urlencode "${ab}")
start=$(urlencode "${start}")
end=$(urlencode "${end}")
strand=$(urlencode "${strand}")
data="__wb_function=PortalExtractSeq&mode=run&species=rhime&fastafile=%2Fwww%2Fbacteria%2Fannotation%2F%2Fsite%2Fprj%2Frhime%2F%2Fdb%2F$ab.genomic&begin=$start&end=$end&strand=$strand"

... or do it in a cumbersome way:

data="__wb_function=PortalExtractSeq&mode=run&species=rhime&fastafile=%2Fwww%2Fbacteria%2Fannotation%2F%2Fsite%2Fprj%2Frhime%2F%2Fdb%2F$(urlencode "${ab}").genomic&begin=$(urlencode "${start}")&end=$(urlencode "${end}")&strand=$(urlencode "${strand}")"

I hope this helps.

Upvotes: 3

Related Questions