user1467663
user1467663

Reputation: 41

using wget to log redirected URLs shell script

I was trying to find a way of using wget to log a the list of redirected website URLs into one file. For example:

www.website.com/1234 now redirects to www.newsite.com/a2as4sdf6nonsense

and

www.website.com/1235 now redirects to www.newsite.com/ab6haq7ah8nonsense

Wget does output the redirect, but doesn't log the new location. I get this in the terminal:

HTTP request sent, awaiting response...301 moved permanently
Location: http.www.newsite.com/a2as4sdf6 

...

I would just like to capture that new URL to a file.

I was using something like this:

    for i in `seq 1 9999`; do
        wget http://www.website.com/$i -O output.txt
    done

But this outputs the sourcecode of each webpage to that file. I am trying to just retrieve only the redirect info. Also, I would like to add a new line to the same output file each time it retrieves a new URL.

I would like the output to look something like:

    www.website.com/1234 www.newsite.com/a2as4sdf6nonsense
    www.website.com/1235 www.newsite.com/ab6haq7ah8nonsense

...

Upvotes: 4

Views: 3552

Answers (1)

Michał Šrajer
Michał Šrajer

Reputation: 31182

It's not a perfect solution, but it works:

wget http://tinyurl.com/2tx --server-response -O /dev/null 2>&1 |\
   awk '(NR==1){SRC=$3;} /^  Location: /{DEST=$2} END{ print SRC, DEST}'

wget is not a perfect tool for that. curl would be bit better.

This is how it works: we get url, but we redirect all output (page content) to /dev/null. We ask for server response http headers (to get Loaction header), then we pass it to awk. Note, that there might be several redirections. I assumed you want the last one. Awk gets the URL you asked for from the first line (NR==1) and destination URL from each Location header. At the end, we print both SRC and DESC as you wanted.

Upvotes: 2

Related Questions