mattruma
mattruma

Reputation: 16687

How to strip characters between HTML tags

I have the following HTML:

<h1>Text Text</h1>      <h2>Text Text</h2>

I am still trying to get a handle on regular expressions, and trying to create one that would eliminate the spacing between the tags.

I would like the final result to be:

<h1>Text Text</h1><h2>Text Text</h2>

Any help would be greatly appreciated!

UPDATE

I would like to strip out all white spaces, tabs and new lines. So if I have:

<div>    <h1>Text Text</h1>      <h2>Text Text</h2>     </div>

I would like it to end up as:

<div><h1>Text Text</h1><h2>Text Text</h2></div>

Upvotes: 0

Views: 1264

Answers (3)

aquinas
aquinas

Reputation: 23796

How about: Regex.Replace(str, @">\s+<","><")

Upvotes: 0

Chris S
Chris S

Reputation: 65466

One alternative to using a regex or string replace is the Html Agility pack.

Here's a rough guess:

/// <summary>
///  Regular expression built for C# on: Tue, Sep 1, 2009, 03:56:27 PM
///  Using Expresso Version: 3.0.2766, http://www.ultrapico.com
///  
///  A description of the regular expression:
///  
///  <h1>
///      <h1>
///  [1]: A numbered capture group. [.+]
///      Any character, one or more repetitions
///  </h1>
///      </h1>
///  Match expression but don't capture it. [\s*]
///      Whitespace, any number of repetitions
///  <h2>
///      <h2>
///  [2]: A numbered capture group. [.+]
///      Any character, one or more repetitions
///  </h2>
///      </h2>
///  
///
/// </summary>
public static Regex regex = new Regex(
      "<h1>(.+)</h1>(?:\\s*)<h2>(.+)</h2>",
    RegexOptions.Singleline
    | RegexOptions.CultureInvariant
    | RegexOptions.Compiled
    );


// This is the replacement string
public static string regexReplace = 
      "<h1>$1</h1><h2>$2</h2>";

Upvotes: 0

John Feminella
John Feminella

Reputation: 311735

If it's just this specific case, here's a suitable regex to find all the spaces:

Regex regexForBreaks = new Regex(@"h1>[\s]*<h2", RegexOptions.Compiled);

However, I think a regex is the wrong approach here if this is a more general case. For example, it's possible for tags to be nested within other tags, and then your problem needs a little more detail to figure out the right answer. As Jamie Zawinski said, "Some people, when confronted with a problem, think, 'I know, I'll use regular expressions.' Now they have two problems."

Upvotes: 1

Related Questions