Javascript regex to split by lines starting with a pattern

Question

My goal is to extract messages from an exported conversation that looks like this

inputText = `3/24/18 - Username : message here
3/24/18 - anotherUser : another message`

What I tried

My naive approach was to just split whenever I have a new line, I used arr = inputText.match(/[^ ]+/g) (Source : JS regex to split by line) which does the work perfectly.

But now I'm facing a case that I didn't think about earlier, it's when a user sends a multi-line message, like:

inputText = `3/24/18 - Username : message here,
other text, same message
3/24/18 - anotherUser : another message`

The input of my first naive approach will output be wrong:

arr = ['3/24/18 - Username : message here',
       'other text, same message',
        '3/24/18- anotherUser : another message']

while I need it to be like this:

arr = ['3/24/18 - Username : message here message here too!!', 
       '3/24/18- anotherUser : another message']

I need to splitline but only when the line starts with the pattern m/d/y - username :

trincot · Accepted Answer

If your lines always start with a date, formatted as in your example, then you could match that. Maybe it is somewhat easier with split

var inputText = `3/24/18 - Username : message here
message here too!!
3/24/18- anotherUser : another message`;

var result = inputText.split(/[
]*(?=^\d+/)/m).filter(Boolean);

console.log(result);

If you then want to replace the and with a space, add a map:

var inputText = `3/24/18 - Username : message here
message here too!!
3/24/18- anotherUser : another message`;

var result = inputText.split(/[
]*(?=^\d+/)/m).filter(Boolean)
    .map(text => text.replace(/[
]+/g, " "));

console.log(result);

Explanation

The regular expression breaks down into the following parts:

[ ]*: any number of newline characters
(?= ): look-ahead to see whether pattern matches the next characters, without actually matching ("eating") them
^\d+/: the pattern that denotes the start of a line: one or more digits followed by a forward slash

Note that the regular expression will match the parts that should define the splits; those characters will not appear in the output. That is why the date-pattern is verified with look-ahead -- we don't want to lose those characters; they belong to the next line.

Because the very first characters of the input will match the split pattern, this will generate an empty string (for what precedes the split): this should be ignored. That is what .filter(Boolean) does. As empty strings are falsy, they will be left out by this filter.

Javascript regex to split by lines starting with a pattern

What I tried

Answers (1)

Explanation

Related Questions