Andy Harvey
Andy Harvey

Reputation: 12653

How to regex match and return strings with a known start format and ending with a double line break?

I'm trying to parse a text file with javascript. I have no control over the contents of the text file.

The text file consists of multiple records. Each record begins with a HH:MM timestamp. Each record is separated by a double line break \n\n. Records may be a single line, or may be multiple lines separated by a single line break \n.

example:

09:00\tRecordA
\tSome extra detail about Record A

10:00\tRecordB
\tSome extra detail about Record B
\tEven more detail about Record B

11:00\tRecordC

I hope to generate an array of records like this:

[ 
    "09:00\tRecordA\n\tSome extra detail about Record A",
    "10:00\tRecordB\n\tSome extra detail about Record B\n\tEven more detail about Record B",
    "11:00\tRecordC"
]

So far I can get the first lines without problem.

textFile.match(/^\d\d:\d\d.*\n?/gm);

[ 
    "09:00\tRecordA",
    "10:00\tRecordB",
    "11:00\tRecordC
]

After a lot of searching, trial and error I'm still having trouble getting the extra details. Below are what appeared to be the most promising avenues, but I'm probably far from the mark.

Adding an extra \n, but as the wildcard doesn't match line breaks this obviously did not work.

textFile.match(/^\d\d:\d\d.*\n\n?/gm);

Using the \s modifier, but this did not split records into separate array items.

textFile.match(/^\d\d:\d\d.*\n?/sgm); 

[ 
    "09:00\tRecordA\n\tSome extra detail about Record A\n\n10:00\tRecordB\n\tSome extra detail about Record B\n\tEven more detail about Record B"
]

Defining a group and repeating it twice, but this returned null

textFile.match(/^\d\d:\d\d.*(\n){2}?/gm);

My regex skills are quite limited and I'm trying to learn. Would appreciate any pointers and advice on this problem.

Upvotes: 0

Views: 77

Answers (1)

Sam Washington
Sam Washington

Reputation: 670

m multiline modifier, will never work, since then its only procesing one line at a time.

without m ^ will only match beginning of text.

The usual newline wildcard is [^] (not nothing), but this will match until last new line.

There might be a way with regex

But you could consider .split("\n\n") instead

Upvotes: 2

Related Questions