Abhishek
Abhishek

Reputation: 191

Convert HTML text to Plain text

I have a text area. I allow entering html markups in that any html code can be entered.

now i want to convert that html code to plain text without using third party tool...how can it be done

currently i am doing it like below:-

var desc = Convert.ToString(Html.Raw(Convert.ToString(drJob["Description"])));

drJob["Description"] is datarow from where I fetch description and I want to convert description to plain text.

Upvotes: 1

Views: 4902

Answers (4)

Meh
Meh

Reputation: 607

using System.Text.RegularExpressions;

    private void button1_Click(object sender, EventArgs e)
    {
        string sauce = htm.Text; // htm = your html box
        Regex myRegex = new Regex(@"(?<=^|>)[^><]+?(?=<|$)", RegexOptions.Compiled);
        foreach (Match iMatch in myRegex.Matches(sauce))
        {
            txt.AppendText(Environment.NewLine + iMatch.Value); //txt = your destination box
        }

    }

Let me know if you need more clarification.

[EDIT:] Be aware that this is not a clean function, so add a line to clean up empty spaces or line breaks. But the actual getting of text from in-between tags should work fine. If you want to save space - use regex and see if this works for you. Although the person who posted about regex not being clean is right, there might be other ways; Regex is usually better when separating a single type of tag from html. (I use it for rainmeter to parse stuff and never had any issues)

Upvotes: 0

MrBassam
MrBassam

Reputation: 359

You can replace html tags with empty string using System.Text.RegularExpressions.Regex

String desc = Regex.Replace(drJob["Description"].ToString(), @"<[^>]*>", String.Empty);

Upvotes: 1

Polity
Polity

Reputation: 15130

There is no direct way coming from .NET to do this. You either need to resort to a third party tool like HtmlAgilePack- or do this in javascript.

document.getElementById('myTextContainer').innerText = document.getElementById('myMarkupContainer').innerText;

For your safety, dont use a regex. ( http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html )

Upvotes: 2

Jerome Cance
Jerome Cance

Reputation: 8183

You can simply use a replace method using regex "<[^>]+>"

Upvotes: 0

Related Questions