Reputation: 21170
I'm using nodeJS
and the request
module. I'm trying to scrape data from a web page, but my data comes from an API which only gives me link-tracking urls.
For instance, this link:
http://www.kqzyfj.com/click-7227532-11292048?url=http%3A%2F%2Fwww.urbanoutfitters.com%2Furban%2Fcatalog%2Fproductdetail.jsp%3Fid%3D27074590
Actually leads here:
http://www.urbanoutfitters.com/urban/catalog/productdetail.jsp?id=27074590&cm_mmc=CJ-_-Affiliates-_-Threadfinder-_-11292048
I'm aware that most of the link is embedded in the original URL, but this isn't always the case, so please ignore it / don't post answers which suggest regex'ing my way out of this!
Using Request, how can I grab the page's URL
(that is, the second link that the first redirects to) and store it as a variable?
Upvotes: 0
Views: 2331
Reputation: 7585
Checkout the line #77 of request.js:
It provides a internal array in response object named redirects
:
var request = require('request');
var url = "http://www.kqzyfj.com/click-7227532-11292048?url=http%3A%2F%2Fwww.urbanoutfitters.com%2Furban%2Fcatalog%2Fproductdetail.jsp%3Fid%3D27074590";
request(url, function (error, response, body) {
if (!error && response.statusCode == 200) {
console.log("%j", response['request']['redirects'])
}
})
Then you can find JSON representation of an array with redirect history including status code and redirect URL. (I found there're 3 redirects out of the URL you've provided)
Upvotes: 0
Reputation: 25161
This should do it:
request(url, function(err, res, body){
// get final redirect url
if(this.redirects.length){
var destUrl = this.redirects[this.redirects.length-1].redirectUri;
console.log(destUrl);
}
});
Upvotes: 1