Reputation: 41
I was trying to find a way of using wget to log a the list of redirected website URLs into one file. For example:
www.website.com/1234
now redirects to www.newsite.com/a2as4sdf6nonsense
and
www.website.com/1235
now redirects to www.newsite.com/ab6haq7ah8nonsense
Wget does output the redirect, but doesn't log the new location. I get this in the terminal:
HTTP request sent, awaiting response...301 moved permanently
Location: http.www.newsite.com/a2as4sdf6
...
I would just like to capture that new URL to a file.
I was using something like this:
for i in `seq 1 9999`; do
wget http://www.website.com/$i -O output.txt
done
But this outputs the sourcecode of each webpage to that file. I am trying to just retrieve only the redirect info. Also, I would like to add a new line to the same output file each time it retrieves a new URL.
I would like the output to look something like:
www.website.com/1234 www.newsite.com/a2as4sdf6nonsense
www.website.com/1235 www.newsite.com/ab6haq7ah8nonsense
...
Upvotes: 4
Views: 3552
Reputation: 31182
It's not a perfect solution, but it works:
wget http://tinyurl.com/2tx --server-response -O /dev/null 2>&1 |\
awk '(NR==1){SRC=$3;} /^ Location: /{DEST=$2} END{ print SRC, DEST}'
wget
is not a perfect tool for that. curl
would be bit better.
This is how it works: we get url, but we redirect all output (page content) to /dev/null. We ask for server response http headers (to get Loaction header), then we pass it to awk. Note, that there might be several redirections. I assumed you want the last one. Awk gets the URL you asked for from the first line (NR==1) and destination URL from each Location header. At the end, we print both SRC and DESC as you wanted.
Upvotes: 2