Reputation:
I have a file which contains texts like below:
adj 1: text1 2: text2 n 1: text4 adj 1: text5 adv 1: text6 3: text7
I want to insert a newline(n
) between adj, n, numbers, etc. to have the following output:
adj 1: text1
2: text2
n 1: text4
adj 1: text5
adv 1: text6
3: text7
I have this regex: \s+\d+|\s+((n|v|adv|adj|)\s+\d+)
Now if I use Regex.Replace()
it add the newline but also would remove the found matches (1, 2, n 1 and so on). Is there any way that I could insert a newline before the match without removing the match?
Upvotes: 0
Views: 1321
Reputation: 2268
Use capture groups.
for generic prefixes, not limited to (n|v|adv|adj)
search for ((\w*?\s)?\d+: [\w]*?($|\s))
for prefixes limited to (n|v|adv|adj)
search for (((n|v|adv|adj)\s)?\d+: [\w]*?($|\s))
replace with $1\n
https://regex101.com/r/vJ1lY1/3
https://msdn.microsoft.com/en-us/library/ewy2t5e0(v=vs.110).aspx
Upvotes: 1
Reputation: 626851
Since my comment was almost correct, I decided to improve it and turn into an answer. The main point is that you have a set of keywords, that you can put into an alternation group and since you know they are followed with a space and digits that are followed with a colon, you may define this block as a separate string. Then, you may match any number of any characters up to the first occurrence of this same block.
Here is a sample demo:
var s = "adj 1: text1 2: text2 n 1: text4 adj 1: text5 adv 1: text6 3: text7";
var block = @"(?:[nv]|ad[vj])?\s*\d+:";
var pat = string.Format(@"{0}.*?\s*(?={0})", block);
var result = Regex.Replace(s, pat, "$&\n");
// => adj 1: text1
//2: text2
//n 1: text4
//adj 1: text5
//adv 1: text6
//3: text7
Pattern details:
(?:[nv]|ad[vj])?\s*\d+:
- matches 1 or 0 occurrences of n
, v
, adj
, adv
, then 0+ whitespaces and 1+ digits followed witha colon..*?\s*
- 0+ any chars but a newline up to the first 0+ whitespaces that ...(?=(?:[nv]|ad[vj])?\s*\d+:)
- ...are followed with the block described above.See the regex demo
Upvotes: 1