Using regex to find start and end of paragraphs in text file

Question

I have this text file which looks like below:

Current File: week-28\gcweb.txt (=>) ########## Old File: week-27\gcweb.txt (<=)



2019-07-21 13:20:42 ip-172-17-3-71=>
2019-07-17 13:27:12 ip-172-17-3-71<=
--------------------------------------------------
--------------------------------------------------
Current File: week-28\gcckup.txt (=>) ########## Old File: week-27\gcckup.txt (<=)



2019-07-21 13:20:46 ip-172-17-2-101=>
2019-07-17 13:27:14 ip-172-17-2-101<=
--------------------------------------------------
--------------------------------------------------

The text from Current File to ------ indicates one para or one part. I need to get all these separately and then apply some other operations on it. I tried using regex to get the entire text starting from Current File.

The regex I used is:

\bCurrent File\b.+

My question is: how can I select the whole text of one para? Having little experience with regex I am hoping to get something like this:

Current File: week28\gcweb.txt       Old File: week-27\gcweb.txt
2019-07-21 13:20:42 ip-172-17-3-71   2019-07-17 13:27:12 ip-172-17-3-71

While (=>) and (<=) are simply indicators for current and old. So I tried using this to get the file path \bCurrent File\b.+(=>) but this gives (=>) as group.

I need help with extracting the strings so that I can apply the rest of the operations on them after this.

Emma · Accepted Answer

I guess you can for instance design some expression that'd look like,

Current File:[\s\S]*?(?=--)

The expression is explained on the top right panel of regex101.com, if you wish to explore/simplify/modify it, and in this link, you can watch how it would match against some sample inputs, if you like.

Edit:

For getting .txt path, we can likely use an expression similar to:

Current File:\s*(\S+\.txt).*Old File:\s*(\S+\.txt)[\s\S]*?(?=-{4,})

Demo 2

const regex = /Current File:\s*(\S+\.txt).*Old File:\s*(\S+\.txt)[\s\S]*?(?=-{4,})/gm;
const str = `Current File: week-28\gcweb.txt (=>) ########## Old File: week-27\gcweb.txt (<=)



2019-07-21 13:20:42 ip-172-17-3-71=>
2019-07-17 13:27:12 ip-172-17-3-71<=
--------------------------------------------------
--------------------------------------------------
Current File: week-28\gcckup.txt (=>) ########## Old File: week-27\gcckup.txt (<=)



2019-07-21 13:20:46 ip-172-17-2-101=>
2019-07-17 13:27:14 ip-172-17-2-101<=
--------------------------------------------------
--------------------------------------------------`;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}

Using regex to find start and end of paragraphs in text file

Answers (2)

Edit:

Related Questions