Reputation: 103
My goal is to extract messages from an exported conversation that looks like this
inputText = `3/24/18 - Username : message here
3/24/18 - anotherUser : another message`
My naive approach was to just split whenever I have a new line, I used
arr = inputText.match(/[^\r\n]+/g)
(Source : JS regex to split by line) which does the work perfectly.
But now I'm facing a case that I didn't think about earlier, it's when a user sends a multi-line message, like:
inputText = `3/24/18 - Username : message here,
other text, same message
3/24/18 - anotherUser : another message`
The input of my first naive approach will output be wrong:
arr = ['3/24/18 - Username : message here',
'other text, same message',
'3/24/18- anotherUser : another message']
while I need it to be like this:
arr = ['3/24/18 - Username : message here message here too!!',
'3/24/18- anotherUser : another message']
I need to splitline but only when the line starts with the pattern m/d/y - username :
Upvotes: 0
Views: 377
Reputation: 350167
If your lines always start with a date, formatted as in your example, then you could match that. Maybe it is somewhat easier with split
var inputText = `3/24/18 - Username : message here
message here too!!
3/24/18- anotherUser : another message`;
var result = inputText.split(/[\r\n]*(?=^\d+\/)/m).filter(Boolean);
console.log(result);
If you then want to replace the \r
and \n
with a space, add a map
:
var inputText = `3/24/18 - Username : message here
message here too!!
3/24/18- anotherUser : another message`;
var result = inputText.split(/[\r\n]*(?=^\d+\/)/m).filter(Boolean)
.map(text => text.replace(/[\r\n]+/g, " "));
console.log(result);
The regular expression breaks down into the following parts:
[\r\n]*
: any number of newline characters(?= )
: look-ahead to see whether pattern matches the next characters, without actually matching ("eating") them^\d+\/
: the pattern that denotes the start of a line: one or more digits followed by a forward slashNote that the regular expression will match the parts that should define the splits; those characters will not appear in the output. That is why the date-pattern is verified with look-ahead -- we don't want to lose those characters; they belong to the next line.
Because the very first characters of the input will match the split pattern, this will generate an empty string (for what precedes the split): this should be ignored. That is what .filter(Boolean)
does. As empty strings are falsy, they will be left out by this filter.
Upvotes: 3