Reputation: 105
I am making a reminder application and I want to be able to support iCalender importing. This is why I want to be able to extract events. This is the basic format of an event in iCalender:
BEGIN:VEVENT
......
......
END:VEVENT
All of these events are in one file so I will have a big list like this:
BEGIN:VEVENT
......
......
END:VEVENT
BEGIN:VEVENT
......
......
END:VEVENT
These events will have a start date and an end date
BEGIN:VEVENT
......
DTSTART;VALUE=DATE:20160402
DTEND;VALUE=DATE:20160403
......
END:VEVENT
When trying to extract just the event, it is not always the same format. The start date and end date can be before or after other certain fields.
Currently I have:
/BEGIN:VEVENT[\s\S]*?DTSTART;VALUE=DATE:20160402[\s\S]*?END:VEVENT/
However this doesn't match just the event itself, it matches from the first match of BEGIN:VEVENT
, matches everything until the date, and then finishes the match at the following END:VEVENT
.
So in some of the events further down the list trying to match them includes lots of others. Is there a way I can match the DTSTART;VALUE=DATE:
and only the previous nearest BEGIN:VEVENT
and the following END:VEVENT
just to extract the single event for that day?
Upvotes: 1
Views: 152
Reputation: 627020
The problem can be solved with a tempered greedy token that can be used to obtain the smallest window possible between two substrings in a text. Since your text is multiline, you cannot use .
atom to match any characters, you need to use either [^]
or [\s\S]
.
So, use
/BEGIN:VEVENT((?:(?!\b(?:END|BEGIN):VEVENT\b)[\s\S])*DTSTART;VALUE=DATE:20160402[\s\S]*?)END:VEVENT/g
See the regex demo
The (?:(?!\b(?:END|BEGIN):VEVENT\b)[\s\S])*
part matches any text that is not BEGIN:VEVENT
and END:VEVENT
(as whole words due to the \b
word boundary).
var re = /BEGIN:VEVENT((?:(?!\b(?:END|BEGIN):VEVENT\b)[\s\S])*DTSTART;VALUE=DATE:20160402[\s\S]*?)END:VEVENT/g;
var str = 'BEGIN:VEVENT\n......\n......\nEND:VEVENT\nBEGIN:VEVENT\n......\n......\nEND:VEVENT\nThese events will have a start date and an end date\n\nBEGIN:VEVENT\n......\nDTSTART;VALUE=DATE:20160402\nDTEND;VALUE=DATE:20160403\n......\nEND:VEVENT';
var res = [];
while ((m = re.exec(str)) !== null) {
res.push(m[0]);
}
document.body.innerHTML = "<pre>" + JSON.stringify(res.map(x => x.replace(/\r?\n/g, "<br/>")), 0, 4) + "</pre>";
Note that [\s\S]*?
can also be replaced with the above tempered greedy token, but it seems that it is not necessary since the VEVENT blocks are well-formed and there are no nested VEVENT blocks. If there are nested VEVENT blocks, the [\s\S]*?
should be replaced with the tempered greedy token.
Upvotes: 1