Reputation: 3157
How can i remove the first occurring paragraph tag contents in string.
Actual String
<p>Hello</p> <p>World</p>
Result
<p>World</p>
One option is to find the position of first <p>
and first </p>
and then replace everything with "" to position </p>
How can this be Achieved with regex?
Upvotes: 1
Views: 3419
Reputation: 70750
Use the Regex.Replace
method defining the count (times the replacement can occur) to 1
Regex rgx = new Regex(@"<p>.*?</p>*");
String input = @"<p>Hello</p> <p>World</p>";
String result = rgx.Replace(input, "", 1);
Upvotes: 1
Reputation: 41848
Apart from the warnings about using regex to parse html...
A. If First Paragraph Always Starts at the Beginning of the String
^<p>.*?</p>
^
anchor asserts that we are at the beginning of the string..*?
ensures that we only match up to the first closing </p>
In C#:
string resultString = Regex.Replace(yourstring, "^<p>.*?</p>", "");
B. If First Paragraph Can Start Anywhere
(?s)(\A.*?)<p>.*?</p>
(?s)
allows the dot to match newlines in case your first paragraph occurs after the first line(\A.*?)
the \A
asserts that we are at the beginning of the string, then the lazy .*?
matches everything up to the first paragraph. This is all captured to Group 1. <p>.*?</p>
matches the paragraphHere is a full C# program to show how this works (see output at the bottom of the online demo).
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main() {
var myRegex = new Regex(@"(?s)(\A.*?)<p>.*?</p>");
string s1 = @"Hey! <p>Hello</p> <p>World</p>";
string replaced = myRegex.Replace(s1, delegate(Match m) {
return m.Groups[1].Value;
});
Console.WriteLine(replaced);
} // END Main
} // END Program
Upvotes: 0
Reputation: 735
You could capture group in the string like this:
string input = @"<p>Hello</p> <p>World</p>";
string pattern = @"<p>(\w*)</p>";
MatchCollection matches = Regex.Matches(input, pattern);
// matches[0] contains <p>Hello</p>
// matches[1] contains <p>World</p>
Upvotes: 0