grimsan55
grimsan55

Reputation: 265

Decode a textfile

So I've loaded the following to a textfile which i then read into my c# program in a list, I then converted the list to a string. Now I want to decode the string from all HTML but I'm not able to. Someone now how? Here is the text to format:

<p> <span style="font-size: 18px;"><strong>Varifr&aring;n kommer den svarta m&auml;rren&nbsp; i Sm&aring;land?</strong></span></p>
  <p> <span style="font-size: 14px;"><input checked="checked" name="ruta1" type="checkbox" value="Svar 1" />&nbsp;Fr&aring;n Tyskland</span></p>
    <p> <input type="checkbox" />Fr&aring;n Belgien</p>
      <p> &nbsp;</p>
        <p> <input type="checkbox" />&nbsp;Fr&aring;n Turkiet</p>
      <p>  &nbsp;</p>
   <p>  &nbsp;</p>
<p>  &nbsp;</p>
public partial class Form1 : Form
    {
        string temp = "TextKod.txt";
        string line = "";
        List<string> texten = new List<string>();
        string vetEj;
        string hoppSan;
        public Form1()
        {
            InitializeComponent();

            StreamReader sr = new StreamReader(temp);

            while ((line = sr.ReadLine()) != null)
            {
                string[] myarray = line.Split('\r');
                vetEj = myarray[0];
                texten.Add(vetEj);
            }
            hoppSan = string.Join("\r", texten);

Upvotes: 0

Views: 80

Answers (2)

Mike Perrenoud
Mike Perrenoud

Reputation: 67898

I think what you really want is to encode the string. But either way, add a reference to System.Web and leverage the HttpUtility class. To decode:

HttpUtility.HtmlDecode(htmlString);

and to encode:

HttpUtility.HtmlEncode(htmlString);

To get rid of all HTML elements, do this:

var cleanHtml = Regex.Replace(htmlString, "<.*?>", "");

You could modify the Regex to this <.*?>|&.*?; to get rid of those &nbsp; elements, but that also matches the &aring; in Fr&aring;n Tyskland, so that's up to you.

Upvotes: 1

Rajesh Subramanian
Rajesh Subramanian

Reputation: 6490

If you are using .NET 4.0+ you can also use WebUtility.HtmlDecode which does not require an extra assembly reference as it is available in the System.Net namespace.

this could also help

 myEncodedString = HttpUtility.HtmlEncode(string);

Upvotes: 0

Related Questions