Minh Nguyen
Minh Nguyen

Reputation: 101

Replace HTML tag content using Regex

I want to encrypt the text content of an HTML Document without changing its layout. The content is stored in pairs of tag , like this : < span style...>text_to_get< /span>. My idea is using Regex to retrieve (1) and to replace each text part with the encrypted text (2). I complete step (1) but have trouble at step (2) . Here is the code I'm working on :

private string encryptSpanContent(string text, string passPhrase, string salt, string  hash, int iteration, string initialVector, int keySize)        
{            
        string resultText = text;
        string pattern = "<span style=(?<style>.*?)>(?<content>.*?)</span>";   
        Regex regex = new Regex(pattern);
        MatchCollection matches = regex.Matches(resultText);          
        foreach (Match match in matches)    
        {                
            string replaceWith = "<span style=" + match.Groups["style"] + ">" + AESEncryption.Encrypt(match.Groups["content"].Value, passPhrase, salt, hash, iteration, initialVector, keySize) + "</span>";                
            resultText = regex.Replace(resultText, replaceWith);
        }
        return resultText;
}

Is this the wrong line (that makes all the texts replaced by the last replaceWith value) ?

            resultText = regex.Replace(resultText, replaceWith);

Can anybody help me to fix this ?

Upvotes: 3

Views: 7697

Answers (2)

Hanzala Ali
Hanzala Ali

Reputation: 19

Here is a simple solution for replacing HTML Tags

string ReplaceBreaks(string value)
{
    return Regex.Replace(value, @"<(.|\n)*?>", string.Empty);
}

Upvotes: -2

Ahmad Mageed
Ahmad Mageed

Reputation: 96477

It's recommended that you use the HTML Agility Pack if you're going to work with HTML, since you might run into issues with regex, especially on nested tags or malformed HTML.

Assuming your HTML is well-formed and you decide to use a regex, you should use the Regex.Replace method that accepts a MatchEvaluator to replace all occurrences.

Try this approach:

string input = @"<div><span style=""color: #000;"">hello, world!</span></div>";
string pattern = @"(?<=<span style=""[^""]+"">)(?<content>.+?)(?=</span>)";
string result = Regex.Replace(input, pattern,
    m => AESEncryption.Encrypt(m.Groups["content"].Value, passPhrase, salt, hash, iteration, initialVector, keySize));

Here I use a lambada expression for the MatchEvaluator and refer to the "content" group as shown above. I also use look-arounds for the span tags to avoid having to include them in the replacement pattern.

Upvotes: 3

Related Questions