Jangari
Jangari

Reputation: 868

Google Apps Script to find URLs in body and format them as hyperlinks

I have a block of text generated from a command line script which spins up a number of virtual machines. The text output contains instructions on how to access webapps on the virtual machines, so something like:

TrainingMachine01
Username: [user] 
Password: [pass] 
iPython: http://ip/ 
RStudio: http://ip:8787/

I take this text and dump it into a Google Doc which is shared with a number of people (we run courses in Python and R, and spin up a VM for each attendee).

I would like to be able to format my output as hyperlinks, so that the attendees only need to click on the URL rather than copy it and paste it into a browser (firstworldproblems).

After investigating ways of pasting text into Google Docs, I don't think there's a simpler solution than a Google Apps script, which would simply find patterns matching a URL, and make them hyperlinks.

Here's what I have so far, based in large part on this answer to another question:

function updateLinks() {
  // Open active doc
  var body = DocumentApp.getActiveDocument().getBody();
  // Find URLs
  var link = body.findText("http:\/\/.*\/");

  // Loop through
  while (link != null) {
    // Get the link as an object
    var foundLink = link.getElement().asText();

    // Get the positions of start and end
    var start = link.getStartOffset();
    var end =link.getEndOffsetInclusive();

    // Format link
    foundLink.setLinkUrl(start, end, foundLink);

    // Find next
    link = body.findText("http:\/\/.*\/", link);
  }
}

My pattern and loop are working fine, except the URL that's being written into the hyperlink is either http://text if I use foundLink in the Format link section, or http://rangeelement if I use the link var.

How can I have the script set the URL as the text itself?

(New to Javascript and have been using exercises like this to learn it and Google Apps Script)

Update: a-change's comment pointed me to the getText() method on text elements, so that the relevant line becomes foundLink.setLinkUrl(start, end, foundLink.getText());. However this is still not quite working, and is inserting links pointing to about:blank. Any ideas how to sanitise the text extracted from findText()?

Upvotes: 1

Views: 4947

Answers (2)

Andrew Mackenzie
Andrew Mackenzie

Reputation: 5737

I have tried other regex examples above and elsewhere and have had troubles reproducing the results - I suspect due to Google Apps Script not being full JS.

This works for me, detects http and https links with a trailing whitespace. I have tested with links that start/end at end of line/para and with preceeding and trailing test (separated by whitespace) and they all work.

function makeLinks() {
  var linkRegex = "https?:\/\/[^\\s]*";

  // Open active doc
  var body = DocumentApp.getActiveDocument().getBody();
  // Find URLs
  //Logger.log(body.findText("http").getElement().asText().getText());
  var link = body.findText(linkRegex);

  // Loop through the body finding texts matching the search pattern
  while (link != null) {
    // Get the link as an object
    var linkElement = link.getElement().asText();
    // Get the positions of start and end
    var start = link.getStartOffset();
    var end = link.getEndOffsetInclusive();

    //slice only the link out of it
    var correctLink = linkElement.getText().slice(start, end);
//    Logger.log("correctLink " + correctLink);

    // Format link
    linkElement.setLinkUrl(start, end, correctLink);
    // Find next
    link = body.findText(linkRegex, link);
  }
}

I hope it helps someone else

Upvotes: 0

a-change
a-change

Reputation: 666

Looked into it in more detail. If you log the value of foundLink.getText() you'll see that it actually contains all the string found on that line, i.e. RStudio: http://ip:8787/ instead of just http://ip:8787/. This probably happens because link.getElement() returns the whole element of the range containing the found text.

You could write all your links on separate lines and the function would work nicely but the doc itself wouldn't probably look that fine.

So what you need to do here is to additionally slice the link out of the foundLink.getText() string. Here's the slightly modified initial function:

 function updateLinks() {
  // Open active doc
  var body = DocumentApp.getActiveDocument().getBody();
  // Find URLs
  //Logger.log(body.findText("http").getElement().asText().getText());
  var link = body.findText("http:\/\/.*\/");
  // Loop through
  while (link != null) {
    // Get the link as an object
    var foundLink = link.getElement().asText();
    // Get the positions of start and end
    var start = link.getStartOffset();
    var end = link.getEndOffsetInclusive();
    //check the value of foundLink if needed
    //Logger.log(foundLink.getText());
    //slice only the link out of it
    var correctLink = foundLink.getText().slice(start, end);
    // Format link
    foundLink.setLinkUrl(start, end, correctLink);
    // Find next
    link = body.findText("http:\/\/.*\/", link);
  }
}

Upvotes: 3

Related Questions