Dan
Dan

Reputation: 105

JS regex: one correct match out of three and one false match

This JS regex error is killing me - one correct match out of three and one false match.

If it makes a difference I am writing my script in Google Apps Script.

I have a string (xml formatted) I want to match three date nodes as follows:

<dateCreated>1619155581543</dateCreated>
<dispatchDate>1619478000000</dispatchDate>
<deliveryDate>1619564400000</deliveryDate>

I don't care about the tags so much - I just need enough to reliably replace them. I am using this regular expression:

var regex = new RegExp('[dD]ate(.{1,})?>[0-9]{13,}</');

These are the matches:

  1. dateCreated>1619155581543</
  2. Created

Obviously I understand number 1 - I wanted that. But I do not understand how 2 was matched. Also why were dispatchDate and deliveryDate not matched? All three targets are matched if I use the above regex in BBEdit and on https://ihateregex.io/playground and neither of those match "Created".

I've also tried this regular expression without success:

var regex = new RegExp('[dD]ate.{0,}>[0-9]{13,}</');

If you can't answer why my regex fails but you can offer a working solution I'd still be happy with that.

Upvotes: 0

Views: 35

Answers (1)

The fourth bird
The fourth bird

Reputation: 163362

The first pattern that you tried [dD]ate(.{1,})?>[0-9]{13,}</ matches:

  • [dD]ate Match date or Date
  • (.{1,})? Optional capture group, match 1+ times any char (This group will capture Created)
  • > Match literally
  • [0-9]{13,} Match 13 or more digits 0-9
  • </ Match literally

What you will get are partial matches from date till </ and the first capture group will contain Created

The second pattern is almost the same, except for {0,} which matches 0 or more times, and there is no capture group.

Still this will give you partial matches.


What you could do to match the whole element is either harvest the power of an XML parser (which would be the recommended way) or use a pattern what assumes only digits between the tags and no < > chars between the opening an closing.

Note that this is a brittle solution.

<([^<>]*[dD]ate[^<>]*)>\d{13}<\/\1>
  • < Match literally
  • ( Capture group 1 (This group is used for the backreference \1 at the end of the pattern
    • [^\s<>]* Match 0+ times any character except < or >
    • [dD]ate[^<>]* Match either date or Date followed 0+ times any char except < or >
  • ) Close group 1
  • > Match literally
  • \d{13} Match 13 digits (or \d{13,} for 13 or more
  • <\/\1> Match </ then a backreference to the exact text that is captured in group 1 (to match the name of the closing tag) and then match >

Regex demo

A bit more restricted pattern could be allowing only word characters \w around matching date

<(\w*[dD]ate\w*)>\d{13}<\/\1>

Regex demo

const regex = /<([^<>]*[dD]ate[^<>]*)>\d{13}<\/\1>/;
[
  "<dateCreated>1619155581543</dateCreated>",
  "<dispatchDate>1619478000000</dispatchDate>",
  "<deliveryDate>1619564400000</deliveryDate>",
  "<thirteendigits>1619564400000</thirteendigits>",
].forEach(str => {
  const match = str.match(regex);
  console.log(match ? `Match --> ${str}` : `No match --> ${str}`)
});

Upvotes: 1

Related Questions