Benny
Benny

Reputation: 35

C# Remove content within html tags(no regex)

I want to remove the text between html tags and then display it in textBox2. I need to get the start postion for "<" and ">" and then delete the tags and everything in between. I dont want to use regex.

Here's what i got so far

        string input = textBox1.Text;
        string output = textBox2.Text;
        string results;
        for (int i = 0; i < input.Length; i++)
        {
            if(input.IndexOf('<',i) !=-1 )
            {


            }

Upvotes: 0

Views: 301

Answers (1)

Dan Herbert
Dan Herbert

Reputation: 103547

This should do what you're looking for. However, it won't handle cases where there is malformed markup. So for example, if you were to enter the input string Hello < world, the output would be Hello.

string input = textBox1.Text;
StringBuilder output = new StringBuilder(input.Length);
bool inATag = false;

for (var i = 0; i < input.Length; i++) {
    if (!inATag && input[i] != '>' && input[i] != '<') {
        output.Append(input[i]);
    } else if (input[i] == '<') {
        inATag = true;
    } else if (input[i] == '>') {
        inATag = false;
    }
}

textBox2.Text = output.ToString();

To explain a little more about what's going on, I'm iterating through the input string one character at a time. If I find an opening <, I enter a state where I will not add any of the input to the output until I find the closing >.

The way I'm generating the output string is by using a StringBuilder to do string concatenation, which improves performance over using just string output += input[i]. It is not recommended to simply use a string as your output variable type because every time you concatenate 2 strings together, it allocates a completely new and distinct string. Over time, this will impact performance. With a StringBuilder, only one string object will be allocated, and no new string objects are created with every iteration through the loop.

Microsoft has written a good explanation of why to use a StringBuilder, but the general rule is that you should be using a StringBuilder any time you find yourself concatenating strings inside of a loop.

Conversely, for situations where your input string is known to always be small, it is better to not use a StringBuilder. There is a penalty for creating a StringBuilder object that isn't overcome if you're only concatenating a small number of strings. For example, if you expect to only do 10 string concatenations it would be considered an anti-pattern to use a StringBuilder. However if you're concatenating hundreds of strings, like you are in this example, it is a very good candidate for using a StringBuilder.

Upvotes: 2

Related Questions