Excelsson
Excelsson

Reputation: 195

How to check for URL redirects in Google Sheets with Google Apps Script

I have been trying to run some URL redirect testing using Google Apps Script in Google Sheets, I've been successful by getting a response code and also the final redirect URL for some of them but most of the links are not working.

Examples of the links I would like to check:

https://www.airbnb.com/rooms/4606613

https://www.airbnb.com/rooms/4661522

https://www.airbnb.com/rooms/6014647

https://www.airbnb.com/rooms/14452305

https://www.airbnb.com/rooms/15910617

Pretty much I need to check if those links will redirect to https://www.airbnb.com/s/homes

Using the script below, I get the following list, which is not correct since all of them will redirect to https://www.airbnb.com/s/homes:

https://www.airbnb.com/rooms/4606613

https://www.airbnb.com/s/homes

https://www.airbnb.com/s/homes

https://www.airbnb.com/rooms/14452305

https://www.airbnb.com/rooms/15910617

It seems that the website is taking 1 second to do the redirect and probably that could be the issue.

Below the code:

function urlProtocol(url){
  return URI(url).protocol()
}

function urlHostname(url){
  return URI(url).hostname()
}

function getRedirects(url) {
  eval(UrlFetchApp.fetch('https://rawgit.com/medialize/URI.js/gh-pages/src/URI.js').getContentText());

  var params = {
    'followRedirects': false,
    'muteHttpExceptions': true
  };

  var baseUrl = urlProtocol(url) + "://" + urlHostname(url),
      response = UrlFetchApp.fetch(url, params),
      responseCode = response.getResponseCode();

  if(response.getHeaders()['Location']){
    var redirectedUrl = getRedirects(baseUrl + response.getHeaders()['Location']);
    return redirectedUrl;
  } else {
    return url;
  }
}

Upvotes: 0

Views: 3496

Answers (1)

0Valt
0Valt

Reputation: 10365

Seems like the final redirect on some of the URLs happens after the page is loaded. Most likely there is a client-side script that initiates the change of window.location. Therefore, your correct logic fails to catch such pages.

To make matters worse, after-load redirect seem to be inconsistent as sometimes the pages you provided are not redirected to https://www.airbnb.com/s/homes. I was able to stop this redirect from happening, so the theory is confirmed - will update with what exactly causes it.


Apart from that, there are several optimizations you can apply to your script:

  1. Get rid of eval and, actually, of the whole library unless you really need it (see how to do the same in just two lines). Improved security is the main benefit: no eval() of external scripts means less possibilities for breach.
  2. Check for status code in 3xx range before looking through the Location header (as a precaucion).
/**
 * 
 * @param {string} target 
 */
const getRedirects = (target) =>

  /**
   * @param {string}
   * @returns {boolean}
   */
  (url) => {

    if(url === target) {
      return false;
    }

    const response = UrlFetchApp.fetch(url, {
      'followRedirects': false,
      'muteHttpExceptions': true
    });

    const code = response.getResponseCode();

    let { Location } = response.getHeaders();

    if (code < 300 || code >= 400) {
      return true;
    }

    if (!Location) {
      return false;
    }

    if (/^\/\w+/.test(Location)) {
      const [protocol, , base] = url.split("/");
      Location = `${protocol}//${base}${Location}`;
    }

    console.log(Location);
    
    return getRedirects(target)(Location);
  };

const testRedirects = () => {

  const redirectsToHome = getRedirects("https://www.airbnb.com/s/homes");

  const accessible = [
    "https://www.airbnb.com/rooms/23861670",
    "https://www.airbnb.com/rooms/4606613",
    "https://www.airbnb.com/rooms/4661522",
    "https://www.airbnb.com/rooms/6014647",
    "https://www.airbnb.com/rooms/14452305",
    "https://www.airbnb.com/rooms/15910617"
  ].filter(redirectsToHome);

  console.log(accessible);
};

Since the clarification that the function is a custom function, you can add a wrapper function that will serve as public API that you can reference in a cell that will call the utility, something like this:

const checkIfRedirects = (source, target = "https://www.airbnb.com/s/homes") => getRedirects(target)(source);

You can then use it like you would do a formula:

=checkIfRedirects(A20)

Upvotes: 2

Related Questions