Przemek
Przemek

Reputation: 317

JScript Regex - extract dates preceded by substrings

I've got oneline string that includes several dates. In JScript Regex I need to extract dates that are proceded by case insensitive substrings of "dat" and "wy" in the given order. Substrings can be preceded by and followed by any character (except new line).

reg = new RegExp('dat.{0,}wy.{0,}\\d{1,4}([\-/ \.])\\d{1,2}([\-/ \.])\\d{1,4}','ig');
str = ('abc18.Dat   wy.03/12/2019FFF*Dato dost2009/03/03**data wy2020-09-30')
result = str.match(reg).toString()

Received result: 'Dat   wy.03/12/2019FFF*Dato dost2009/03/03**data wy2020-09-30'
Expected result: 'Dat   wy.03/12/2019,data wy2020-09-30' or preferably: '03/12/2019,2020-09-30' 

Thanks.

Upvotes: 0

Views: 113

Answers (2)

bobble bubble
bobble bubble

Reputation: 18515

Several issues.

  1. You want to match as few as possible between the substrings and date, but your current regex uses greed .{0,} (same like .*). See this Question and use .*? instead.
  2. dat.*?wy.*?FOO can still skip over any other dat. To avoid skipping over, use what some call a Tempered Greedy Token. The .*? becomes (?:(?!dat).)*? for NOT skipping over.
  3. Not really an issue, but you can capture the date separator and reuse it.

If you want to extract only the date part, also use capturing groups. I put a demo at regex101.

dat(?:(?!dat).)*?wy.*?(\d{1,4}([/ .-])\d{1,2}\2\d{1,4})

There are many ways to achieve your desired outcome. Another idea, I would think of - if you know, there will never appear any digits between the dates, use \D for non-digit instead of the .

dat\D*?wy\D*(\d{1,4}([/ .-])\d{1,2}\2\d{1,4})

Upvotes: 2

The fourth bird
The fourth bird

Reputation: 163362

You might use a capturing group with a backreference to make sure the separators like - and / are the same in the matched date.

\bdat\w*\s*wy\.?(\d{4}([-/ .])\d{2}\2\d{2}|\d{2}([-/ .])\d{2}\3\d{4})
  • \bdat\w*\s*wy\.? A word boundary, match dat followed by 0+ word chars and 0+ whitespace chars. Then match wy and an optional .
  • ( Capture group 1
    • \d{4}([-/ .])\d{2}\2\d{2} Match a date like format starting with the year where \2 is a backreference to what is captured in group 2
    • | Or
    • \d{2}([-/ .])\d{2}\3\d{4} Match a date like format ending with the year where \3 is a backreference to what is captured in group 3
  • ) Close group

The value is in capture group 1

Regex demo

Note That you could make the date more specific specifying ranges for the year, month and day.

Upvotes: 2

Related Questions