pancho
pancho

Reputation: 186

How to retrieve the real redirect location header with Curl? without using {redirect_url}

I realized that Curl {redirect_url} does not always show the same redirect URL. For example if the URL header isLocation: https:/\example.com this will redirect to https:/\example.com but curl {redirect_url} shows redirect_url: https://host-domain.com/https:/\example.com and it won't display the response real location header. (I like to see the real location: result.)

This is the BASH I'm working with:

#!/bin/bash
# Usage: urls-checker.sh domains.txt
FILE="$1"
while read -r LINE; do
     # read the response to a variable
     response=$(curl -H 'Cache-Control: no-cache' -s -k --max-time 2 --write-out '%{http_code} %{size_header} %{redirect_url} ' "$LINE")
     # get the title
     title=$(sed -n 's/.*<title>\(.*\)<\/title>.*/\1/ip;T;q'<<<"$response")
     # read the write-out from the last line
     read -r http_code size_header redirect_url < <(tail -n 1 <<<"$response")
     printf "***Url: %s\n\n" "$LINE"
     printf "Status: %s\n\n" "$http_code"
     printf "Size: %s\n\n" "$size_header"
     printf "Redirect-url: %s\n\n" "$redirect_url"
     printf "Title: %s\n\n" "$title"
     # -c 20 only shows the 20 first chars from response
     printf "Body: %s\n\n" "$(head -c 100 <<<"$response")"
done < "${FILE}"

How can I printf "Redirect-url: the original requested location: header without having to use redirect_url?

Upvotes: 7

Views: 20986

Answers (3)

Salem
Salem

Reputation: 774

According to @randomir answer and since I was only need raw redirect URL I use this command on my batch

 curl  -w "%{redirect_url}" -o /dev/null -s "https://stackoverflow.com/q/46507336/3019002"

Upvotes: 4

randomir
randomir

Reputation: 18697

To read the exact Location header field value, as returned by the server, you can use the -i/--include option, in combination with grep.

For example:

$ curl 'http://httpbin.org/redirect-to?url=http:/\example.com' -si | grep -oP 'Location: \K.*'
http:/\example.com

Or, if you want to read all headers, content and the --write-out variables line (according to your script):

response=$(curl -H 'Cache-Control: no-cache' -s -i -k --max-time 2 --write-out '%{http_code} %{size_header} %{redirect_url} ' "$url")

# break the response in parts
headers=$(sed -n '1,/^\r$/p' <<<"$response")
content=$(sed -e '1,/^\r$/d' -e '$d' <<<"$response")
read -r http_code size_header redirect_url < <(tail -n1 <<<"$response")

# get the real Location
location=$(grep -oP 'Location: \K.*' <<<"$headers")

Fully integrated in your script, this looks like:

#!/bin/bash
# Usage: urls-checker.sh domains.txt
file="$1"
while read -r url; do
    # read the response to a variable
    response=$(curl -H 'Cache-Control: no-cache' -s -i -k --max-time 2 --write-out '%{http_code} %{size_header} %{redirect_url} ' "$url")

    # break the response in parts
    headers=$(sed -n '1,/^\r$/p' <<<"$response")
    content=$(sed -e '1,/^\r$/d' -e '$d' <<<"$response")
    read -r http_code size_header redirect_url < <(tail -n1 <<<"$response")

    # get the real Location
    location=$(grep -oP 'Location: \K.*' <<<"$headers")

    # get the title
    title=$(sed -n 's/.*<title>\(.*\)<\/title>.*/\1/ip;T;q'<<<"$content")

    printf "***Url: %s\n\n" "$url"
    printf "Status: %s\n\n" "$http_code"
    printf "Size: %s\n\n" "$size_header"
    printf "Redirect-url: %s\n\n" "$location"
    printf "Title: %s\n\n" "$title"
    printf "Body: %s\n\n" "$(head -c 100 <<<"$content")"
done < "$file"

Upvotes: 8

Daniel Stenberg
Daniel Stenberg

Reputation: 58124

https:/\example.com is not a legal URL(*). The fact that this works in browsers in an abomination (that I've fought against) and curl doesn't. %{redirect_url} shows exactly the URL curl would redirect to...

A URL should use to forward slashes, so the above should look like http://example.com.

(*) = I refuse to accept the WHATWG "definition".

Upvotes: 0

Related Questions