danbeggan
danbeggan

Reputation: 105

Regex Matching everything between 2 repeating values

I am making a reminder application and I want to be able to support iCalender importing. This is why I want to be able to extract events. This is the basic format of an event in iCalender:

BEGIN:VEVENT
......
......
END:VEVENT

All of these events are in one file so I will have a big list like this:

BEGIN:VEVENT
......
......
END:VEVENT
BEGIN:VEVENT
......
......
END:VEVENT

These events will have a start date and an end date

BEGIN:VEVENT
......
DTSTART;VALUE=DATE:20160402
DTEND;VALUE=DATE:20160403
......
END:VEVENT

When trying to extract just the event, it is not always the same format. The start date and end date can be before or after other certain fields.

Currently I have:

/BEGIN:VEVENT[\s\S]*?DTSTART;VALUE=DATE:20160402[\s\S]*?END:VEVENT/

However this doesn't match just the event itself, it matches from the first match of BEGIN:VEVENT, matches everything until the date, and then finishes the match at the following END:VEVENT.

So in some of the events further down the list trying to match them includes lots of others. Is there a way I can match the DTSTART;VALUE=DATE: and only the previous nearest BEGIN:VEVENT and the following END:VEVENT just to extract the single event for that day?

Upvotes: 1

Views: 152

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627020

The problem can be solved with a tempered greedy token that can be used to obtain the smallest window possible between two substrings in a text. Since your text is multiline, you cannot use . atom to match any characters, you need to use either [^] or [\s\S].

So, use

/BEGIN:VEVENT((?:(?!\b(?:END|BEGIN):VEVENT\b)[\s\S])*DTSTART;VALUE=DATE:20160402[\s\S]*?)END:VEVENT/g

See the regex demo

The (?:(?!\b(?:END|BEGIN):VEVENT\b)[\s\S])* part matches any text that is not BEGIN:VEVENT and END:VEVENT (as whole words due to the \b word boundary).

var re = /BEGIN:VEVENT((?:(?!\b(?:END|BEGIN):VEVENT\b)[\s\S])*DTSTART;VALUE=DATE:20160402[\s\S]*?)END:VEVENT/g; 
var str = 'BEGIN:VEVENT\n......\n......\nEND:VEVENT\nBEGIN:VEVENT\n......\n......\nEND:VEVENT\nThese events will have a start date and an end date\n\nBEGIN:VEVENT\n......\nDTSTART;VALUE=DATE:20160402\nDTEND;VALUE=DATE:20160403\n......\nEND:VEVENT';
var res = [];
 
while ((m = re.exec(str)) !== null) {
    res.push(m[0]);
}

document.body.innerHTML = "<pre>" + JSON.stringify(res.map(x => x.replace(/\r?\n/g, "<br/>")), 0, 4) + "</pre>";

Note that [\s\S]*? can also be replaced with the above tempered greedy token, but it seems that it is not necessary since the VEVENT blocks are well-formed and there are no nested VEVENT blocks. If there are nested VEVENT blocks, the [\s\S]*? should be replaced with the tempered greedy token.

Upvotes: 1

Related Questions