Reputation: 195
I have been trying to run some URL redirect testing using Google Apps Script in Google Sheets, I've been successful by getting a response code and also the final redirect URL for some of them but most of the links are not working.
Examples of the links I would like to check:
https://www.airbnb.com/rooms/4606613
https://www.airbnb.com/rooms/4661522
https://www.airbnb.com/rooms/6014647
https://www.airbnb.com/rooms/14452305
https://www.airbnb.com/rooms/15910617
Pretty much I need to check if those links will redirect to https://www.airbnb.com/s/homes
Using the script below, I get the following list, which is not correct since all of them will redirect to https://www.airbnb.com/s/homes:
https://www.airbnb.com/rooms/4606613
https://www.airbnb.com/s/homes
https://www.airbnb.com/s/homes
https://www.airbnb.com/rooms/14452305
https://www.airbnb.com/rooms/15910617
It seems that the website is taking 1 second to do the redirect and probably that could be the issue.
Below the code:
function urlProtocol(url){
return URI(url).protocol()
}
function urlHostname(url){
return URI(url).hostname()
}
function getRedirects(url) {
eval(UrlFetchApp.fetch('https://rawgit.com/medialize/URI.js/gh-pages/src/URI.js').getContentText());
var params = {
'followRedirects': false,
'muteHttpExceptions': true
};
var baseUrl = urlProtocol(url) + "://" + urlHostname(url),
response = UrlFetchApp.fetch(url, params),
responseCode = response.getResponseCode();
if(response.getHeaders()['Location']){
var redirectedUrl = getRedirects(baseUrl + response.getHeaders()['Location']);
return redirectedUrl;
} else {
return url;
}
}
Upvotes: 0
Views: 3496
Reputation: 10365
Seems like the final redirect on some of the URLs happens after the page is loaded. Most likely there is a client-side script that initiates the change of window.location
. Therefore, your correct logic fails to catch such pages.
To make matters worse, after-load redirect seem to be inconsistent as sometimes the pages you provided are not redirected to https://www.airbnb.com/s/homes
. I was able to stop this redirect from happening, so the theory is confirmed - will update with what exactly causes it.
Apart from that, there are several optimizations you can apply to your script:
eval
and, actually, of the whole library unless you really need it (see how to do the same in just two lines). Improved security is the main benefit: no eval()
of external scripts means less possibilities for breach.Location
header (as a precaucion)./**
*
* @param {string} target
*/
const getRedirects = (target) =>
/**
* @param {string}
* @returns {boolean}
*/
(url) => {
if(url === target) {
return false;
}
const response = UrlFetchApp.fetch(url, {
'followRedirects': false,
'muteHttpExceptions': true
});
const code = response.getResponseCode();
let { Location } = response.getHeaders();
if (code < 300 || code >= 400) {
return true;
}
if (!Location) {
return false;
}
if (/^\/\w+/.test(Location)) {
const [protocol, , base] = url.split("/");
Location = `${protocol}//${base}${Location}`;
}
console.log(Location);
return getRedirects(target)(Location);
};
const testRedirects = () => {
const redirectsToHome = getRedirects("https://www.airbnb.com/s/homes");
const accessible = [
"https://www.airbnb.com/rooms/23861670",
"https://www.airbnb.com/rooms/4606613",
"https://www.airbnb.com/rooms/4661522",
"https://www.airbnb.com/rooms/6014647",
"https://www.airbnb.com/rooms/14452305",
"https://www.airbnb.com/rooms/15910617"
].filter(redirectsToHome);
console.log(accessible);
};
Since the clarification that the function is a custom function, you can add a wrapper function that will serve as public API that you can reference in a cell that will call the utility, something like this:
const checkIfRedirects = (source, target = "https://www.airbnb.com/s/homes") => getRedirects(target)(source);
You can then use it like you would do a formula:
=checkIfRedirects(A20)
Upvotes: 2