user2678324
user2678324

Reputation:

Insert newline before the regex match with c#

I have a file which contains texts like below:

adj 1: text1 2: text2 n 1: text4 adj 1: text5 adv 1: text6 3: text7

I want to insert a newline(n) between adj, n, numbers, etc. to have the following output:

adj 1: text1 
2: text2 
n 1: text4 
adj 1: text5 
adv 1: text6 
3: text7

I have this regex: \s+\d+|\s+((n|v|adv|adj|)\s+\d+)

Now if I use Regex.Replace() it add the newline but also would remove the found matches (1, 2, n 1 and so on). Is there any way that I could insert a newline before the match without removing the match?

Upvotes: 0

Views: 1321

Answers (2)

yosefrow
yosefrow

Reputation: 2268

Use capture groups.

for generic prefixes, not limited to (n|v|adv|adj)

search for ((\w*?\s)?\d+: [\w]*?($|\s))

for prefixes limited to (n|v|adv|adj)

search for (((n|v|adv|adj)\s)?\d+: [\w]*?($|\s))

replace with $1\n

https://regex101.com/r/vJ1lY1/3

https://msdn.microsoft.com/en-us/library/ewy2t5e0(v=vs.110).aspx

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626851

Since my comment was almost correct, I decided to improve it and turn into an answer. The main point is that you have a set of keywords, that you can put into an alternation group and since you know they are followed with a space and digits that are followed with a colon, you may define this block as a separate string. Then, you may match any number of any characters up to the first occurrence of this same block.

Here is a sample demo:

var s = "adj 1: text1 2: text2 n 1: text4 adj 1: text5 adv 1: text6 3: text7";
var block = @"(?:[nv]|ad[vj])?\s*\d+:";
var pat = string.Format(@"{0}.*?\s*(?={0})", block);
var result = Regex.Replace(s, pat, "$&\n");
// => adj 1: text1 
//2: text2 
//n 1: text4 
//adj 1: text5 
//adv 1: text6 
//3: text7

Pattern details:

  • (?:[nv]|ad[vj])?\s*\d+: - matches 1 or 0 occurrences of n, v, adj, adv, then 0+ whitespaces and 1+ digits followed witha colon.
  • .*?\s* - 0+ any chars but a newline up to the first 0+ whitespaces that ...
  • (?=(?:[nv]|ad[vj])?\s*\d+:) - ...are followed with the block described above.

See the regex demo

Upvotes: 1

Related Questions