Reputation: 3795
I'm trying to create a Regex javascript split, but I'm totally stuck. Here's my input:
9:30 pm
The user did action A.
10:30 pm
Welcome, user John Doe.
***This is a comment
11:30 am
This is some more input.
I want the output array after the split() to be (I've removed the \n
for readability):
["9:30 pm The user did action A.", "10:30 pm Welcome, user John Doe.", "***This is a comment", "11:30 am This is some more input." ];
My current regular expression is:
var split = text.split(/\s*(?=(\b\d+:\d+|\*\*\*))/);
This works, but there is one problem: the timestamps get repeated in extra elements. So I get:
["9:30", "9:30 pm The user did action A.", "10:30", "10:30 pm Welcome, user John Doe.", "***This is a comment", "11:30", "11:30 am This is some more input." ];
I cant split on the newlines \n
because they aren't consistent, and sometimes there may be no newlines at all.
Could you help me out with a Regex for this?
Thanks so much!!
EDIT: in reply to phleet
It could look like this:
9:30 pm
The user did action A.
He also did action B
10:30 pm Welcome, user John Doe.
Basically, there may or may not be a newline after the timestamp, and there may be multiple newlines for the event description.
Upvotes: 5
Views: 164
Reputation: 383716
I believe the issue is with regards to how Javascript's split
treats capturing groups. The solution may just be to use non-capturing group in your pattern. That is, instead of:
/\s*(?=(\b\d+:\d+|\*\*\*))/
Use
/\s*(?=(?:\b\d+:\d+|\*\*\*))/
^^
The (?:___)
is what is called a non-capturing group.
Looking at the overall pattern, however, the grouping is not actually needed. You should be able to just use:
/\s*(?=\b\d+:\d+|\*\*\*)/
Instead of \*\*\*
, you could use [*]{3}
. This may be more readable. The *
is not a meta-character inside a character class definition, so it doesn't have to be escaped. The {3}
is how you denote "exactly 3 repetition of".
Upvotes: 3