Greg Forel
Greg Forel

Reputation: 2579

Google App Scripts regex body.findText(searchPattern) returns null if new lines

Here's a Google Document content:

Some text, more text...

<li>
some lines
more lines...
</li>

And more text

I would like a regex to match:

<li>
...
</li>

So far it returns null. My regex only finds <li>...</li>, but not with new lines, although I am using the (?s) tag suggested to ensure that . includes any character and new lines:

(?s)<li>(.)*?</li>

My regex works in https://regexr.com/ and https://regex101.com/, so I don't understand it doesn't in Google App Script.

Upvotes: 0

Views: 429

Answers (1)

Tanaike
Tanaike

Reputation: 201613

  • You want to retrieve the text of <li ...>....</li> in Google Document.
  • You want to achieve this using Google Apps Script.

If my understanding is correct, how about this answer? Please think of this as just one of several possible answers.

Issue and workaround:

In your case, you want to use the pattern of <li sheet="[a-zA-Z0-9]*">[\s\S]*?<\/li>, please modify to <li sheet="[a-zA-Z0-9]*">[\\s\\S]*?<\/li>. In your case, <li ...>....</li> has several paragraphs. (From your sample value, I thought like this.) By this, when the pattern of const searchPattern = '<li sheet="[a-zA-Z0-9]*">[\\s\\S]*?<\/li>' is used for body.findText(searchPattern), null is returned. If <li ...>....</li> is put as one paragraph, body.findText(searchPattern) returns <li ...>....</li>.

In order to search <li ...>....</li> which has several paragraphs, how about the following workaround? The flow of this workaround is as follows.

Flow:

  1. Use <li sheet= and <\/li> as patterns for searching.
  2. Using the pattern of <li sheet=, retrieve the begin paragraph of <li ...>.
  3. Using the pattern of <\/li>, retrieve the end paragraph of </li>.
  4. Retrieve the texts between the retrieved begin and end paragraph.
  5. This cycle is continued until all <li ...>....</li> values are searched.

Sample script:

function parseLists(body) {
  // var doc = DocumentApp.getActiveDocument();
  // var body = doc.getBody();

  var pattern1 = "<li sheet=";
  var pattern2 = "<\/li>";
  var range1 = body.findText(pattern1);
  var res = [];
  while (range1) {
    var temp = {};
    var p1 = range1.getElement().getParent();
    temp.startIndex = body.getChildIndex(p1);
    var range2;
    if (p1) {
      range2 = body.findText(pattern2, range1);
      var p2 = range2.getElement().getParent();
      temp.endIndex = body.getChildIndex(p2) + 1;
      var texts = "";
//      for (var i = temp.startIndex + 1; i < temp.endIndex - 1; i++) {
      for (var i = temp.startIndex; i < temp.endIndex; i++) {
        texts += body.getChild(i).asParagraph().getText();
      }
      temp.texts = texts;
      res.push(temp);
    }
    range1 = body.findText(pattern1, range2);
  }
  Logger.log(res)
}

Result:

When your sample values are put to new Google Document and run the script, the following result is retrieved.

[
  {
    "startIndex": 0,
    "endIndex": 5,
    "texts": "<li sheet=\"experiences\">{{company_name}},  {{job_location}} — {{job_title}}MONTH {{from}} - {{to}}{{description}}</li>"
  },
  {
    "startIndex": 6,
    "endIndex": 9,
    "texts": "<li sheet=\"other\">{{test}}</li>"
  }
]
  • For above result, if you want to retrieve the values of {{company_name}}, {{job_location}} — {{job_title}}MONTH {{from}} - {{to}}{{description}} and {{test}} without the tags, please modify above script as follows.

    • From:

      for (var i = temp.startIndex; i < temp.endIndex; i++) {
      
    • To:

      for (var i = temp.startIndex + 1; i < temp.endIndex - 1; i++) {
      

References:

If I misunderstood your question and this was not the direction you want, I apologize.

Upvotes: 2

Related Questions