Reputation: 450
I have this text file which looks like below:
Current File: week-28\gcweb.txt (=>) ########## Old File: week-27\gcweb.txt (<=)
2019-07-21 13:20:42 ip-172-17-3-71=>
2019-07-17 13:27:12 ip-172-17-3-71<=
--------------------------------------------------
--------------------------------------------------
Current File: week-28\gcckup.txt (=>) ########## Old File: week-27\gcckup.txt (<=)
2019-07-21 13:20:46 ip-172-17-2-101=>
2019-07-17 13:27:14 ip-172-17-2-101<=
--------------------------------------------------
--------------------------------------------------
The text from Current File
to ------
indicates one para or one part. I need to get all these separately and then apply some other operations on it. I tried using regex to get the entire text starting from Current File
.
The regex I used is:
\bCurrent File\b.+
My question is: how can I select the whole text of one para? Having little experience with regex I am hoping to get something like this:
Current File: week28\gcweb.txt Old File: week-27\gcweb.txt
2019-07-21 13:20:42 ip-172-17-3-71 2019-07-17 13:27:12 ip-172-17-3-71
While (=>)
and (<=)
are simply indicators for current and old. So I tried using this to get the file path \bCurrent File\b.+(=>)
but this gives (=>)
as group.
I need help with extracting the strings so that I can apply the rest of the operations on them after this.
Upvotes: 1
Views: 336
Reputation: 163457
Another option to get the filenames in a group followed by the match could be:
Current File: (\S+\.txt)[^O]*(?:O(?!ld File)|[^O])+ Old File: (\S+\.txt).*(?:\r?\n(?!--).*)*(?=\r?\n--)
Current File: (\S+\.txt)
Match Current File: and capture the filename in group 1.[^O]*
Match 0+ times any char except O(?:
Non capturing group
O(?!ld File)
Match O, assert what is directly on the right is not ld File
|
Or[^O]
Match any char except O)+
Close non capturing group and repeat 1+ times Old File: (\S+.txt)
Match space, Old File: and capture the filename in group 2.*
Match any char except newline 0+ times(?:
Non capturing group
\r?\n(?!--)
Match a newline and assert what is on the right is not --
.*
Match any char except a newline 0+ times)*
Close non capturing group and repeat 0+ times(?=\r?\n--)
Positive lookahead, assert what is on the right is a newline and --const regex = /Current File:[ \t]*(\S+\.txt)[^O]*(?:O(?!ld File)|[^O])+ Old File:[ \t]*(\S+\.txt).*(?:\r?\n(?!--).*)*(?=\r?\n--)/gm;
const str = `Current File: week-28\\gcweb.txt (=>) ########## Old File: week-27\\gcweb.txt (<=)
2019-07-21 13:20:42 ip-172-17-3-71=>
2019-07-17 13:27:12 ip-172-17-3-71<=
--------------------------------------------------
--------------------------------------------------
Current File: week-28\\gcckup.txt (=>) ########## Old File: week-27\\gcckup.txt (<=)
2019-07-21 13:20:46 ip-172-17-2-101=>
2019-07-17 13:27:14 ip-172-17-2-101<=
--------------------------------------------------
--------------------------------------------------`;
let m;
while ((m = regex.exec(str)) !== null) {
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
Upvotes: 1
Reputation: 27733
I guess you can for instance design some expression that'd look like,
Current File:[\s\S]*?(?=--)
The expression is explained on the top right panel of regex101.com, if you wish to explore/simplify/modify it, and in this link, you can watch how it would match against some sample inputs, if you like.
For getting .txt
path, we can likely use an expression similar to:
Current File:\s*(\S+\.txt).*Old File:\s*(\S+\.txt)[\s\S]*?(?=-{4,})
const regex = /Current File:\s*(\S+\.txt).*Old File:\s*(\S+\.txt)[\s\S]*?(?=-{4,})/gm;
const str = `Current File: week-28\\gcweb.txt (=>) ########## Old File: week-27\\gcweb.txt (<=)
2019-07-21 13:20:42 ip-172-17-3-71=>
2019-07-17 13:27:12 ip-172-17-3-71<=
--------------------------------------------------
--------------------------------------------------
Current File: week-28\\gcckup.txt (=>) ########## Old File: week-27\\gcckup.txt (<=)
2019-07-21 13:20:46 ip-172-17-2-101=>
2019-07-17 13:27:14 ip-172-17-2-101<=
--------------------------------------------------
--------------------------------------------------`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
Upvotes: 1