user1263981
user1263981

Reputation: 3157

remove first occurring paragraph tag contents in string

How can i remove the first occurring paragraph tag contents in string.

Actual String
<p>Hello</p> <p>World</p>

Result
<p>World</p>

One option is to find the position of first <p> and first </p> and then replace everything with "" to position </p>

How can this be Achieved with regex?

Upvotes: 1

Views: 3419

Answers (3)

hwnd
hwnd

Reputation: 70750

Use the Regex.Replace method defining the count (times the replacement can occur) to 1

Regex rgx     = new Regex(@"<p>.*?</p>*");
String input  = @"<p>Hello</p> <p>World</p>";
String result = rgx.Replace(input, "", 1);

Upvotes: 1

zx81
zx81

Reputation: 41848

Apart from the warnings about using regex to parse html...

A. If First Paragraph Always Starts at the Beginning of the String

  • Search: ^<p>.*?</p>
  • Replace: empty string
  • The ^ anchor asserts that we are at the beginning of the string.
  • The lazy .*? ensures that we only match up to the first closing </p>

In C#:

string resultString = Regex.Replace(yourstring, "^<p>.*?</p>", "");

B. If First Paragraph Can Start Anywhere

  • Search: (?s)(\A.*?)<p>.*?</p>
  • Replace: in the delegate function, return Group 1.
  • (?s) allows the dot to match newlines in case your first paragraph occurs after the first line
  • In the (\A.*?) the \A asserts that we are at the beginning of the string, then the lazy .*? matches everything up to the first paragraph. This is all captured to Group 1.
  • <p>.*?</p> matches the paragraph
  • The replacement is Group 1, so the paragraph is deleted.

Here is a full C# program to show how this works (see output at the bottom of the online demo).

using System;
using System.Text.RegularExpressions;
class Program
{
static void Main() {
var myRegex = new Regex(@"(?s)(\A.*?)<p>.*?</p>");
string s1 = @"Hey! <p>Hello</p> <p>World</p>";

string replaced = myRegex.Replace(s1, delegate(Match m) {
return m.Groups[1].Value;
});
Console.WriteLine(replaced);

} // END Main
} // END Program

Upvotes: 0

fantastik78
fantastik78

Reputation: 735

You could capture group in the string like this:

string input = @"<p>Hello</p> <p>World</p>";
string pattern = @"<p>(\w*)</p>";
MatchCollection matches = Regex.Matches(input, pattern);
// matches[0] contains <p>Hello</p>
// matches[1] contains <p>World</p>

Upvotes: 0

Related Questions