R.Vector
R.Vector

Reputation: 1719

Parsing invalid characters to XML

the application idea is simple , the application is given a path , and application writes each file`s path into XML , the problem i am facing is the file name can have invalid character and that makes the application stops working , here is the code i use to parse file information into XML :

    // the collecting details method
    private void Get_Properties(string path)
    {
        // Load the XML File
        XmlDocument xml = new XmlDocument();
        xml.Load("Details.xml");

        foreach (string eachfile in Files)
        {
            try
            {
                FileInfo Info = new FileInfo(eachfile);

                toolStripStatusLabel1.Text = "Adding : " + Info.Name;

                // Create the Root element
                XmlElement ROOT = xml.CreateElement("File");

                if (checkBox1.Checked)
                {
                    XmlElement FileName = xml.CreateElement("FileName");
                    FileName.InnerText = Info.Name;
                    ROOT.AppendChild(FileName);
                }

                if (checkBox2.Checked)
                {
                    XmlElement FilePath = xml.CreateElement("FilePath");
                    FilePath.InnerText = Info.FullName;
                    ROOT.AppendChild(FilePath);
                }

                if (checkBox3.Checked)
                {
                    XmlElement ModificationDate = xml.CreateElement("ModificationDate");
                    string lastModification = Info.LastAccessTime.ToString();
                    ModificationDate.InnerText = lastModification;
                    ROOT.AppendChild(ModificationDate);
                }

                if (checkBox4.Checked)
                {
                    XmlElement CreationDate = xml.CreateElement("CreationDate");
                    string Creation = Info.CreationTime.ToString();
                    CreationDate.InnerText = Creation;
                    ROOT.AppendChild(CreationDate);
                }

                if (checkBox5.Checked)
                {
                    XmlElement Size = xml.CreateElement("Size");
                    Size.InnerText = Info.Length.ToString() + " Bytes";
                    ROOT.AppendChild(Size);
                }

                xml.DocumentElement.InsertAfter(ROOT, xml.DocumentElement.LastChild);

                // +1 step in progressbar
                toolStripProgressBar1.PerformStep();
                success_counter++;
                Thread.Sleep(10);
            }
            catch (Exception ee)
            {
                toolStripProgressBar1.PerformStep();

                error_counter++;
            }
        }

        toolStripStatusLabel1.Text = "Now Writing the Details File";

        xml.Save("Details.xml");

        toolStripStatusLabel1.Text = success_counter + " Items has been added and "+ error_counter +" Items has Failed , Total Files Processed ("+Files.Count+")";

        Files.Clear();
    }

Here is how the XML looks like after Generation of details :

<?xml version="1.0" encoding="utf-8"?>
 <Files>
  <File>
    <FileName>binkw32.dll</FileName>
    <FilePath>D:\ALL DLLS\binkw32.dll</FilePath>
    <ModificationDate>3/31/2012 5:13:56 AM</ModificationDate>
    <CreationDate>3/31/2012 5:13:56 AM</CreationDate>
    <Size>286208 Bytes</Size>
  </File>
 <File>

Example of characters i would like to parse to XML without issue :

BX]GC^O^_nI_C{jv_rbp&1b_H âo&psolher d) doိiniᖭ

icon_Áq偩侉₳㪏ံ�ぞ鵃_䑋屡1]

MAnaFor줡�

EDIT [PROBLEM SOLVED]

All i had to do is : 1- convert the file name to UTF8-Bytes 2- Convert the UTF8-Bytes back to string

Here is the method :

byte[] FilestoBytes = System.Text.Encoding.UTF8.GetBytes(Info.Name);
string utf8 = System.Text.Encoding.UTF8.GetString(FilestoBytes);

Upvotes: 1

Views: 2683

Answers (3)

Habib
Habib

Reputation: 223187

Illegal characters in XML are &, < and > (as well as " or ' in attributes)

In file system on windows you can have only & and ' in the file name (<,>," are not allowed in file name)

While saving XML you can escape these characters. For example for & you will require &amp;

Upvotes: 1

jorgebg
jorgebg

Reputation: 6600

Probably the xml is malformed. Xml files can not have some characters without being escaped. For example, this is not valid:

<dummy>You & Me</dummy>

Instead you should use:

<dummy>You &amp; Me</dummy>

Illegal characters in XML are &, < and > (as well as " or ' in attributes)

Upvotes: 2

Jon Skeet
Jon Skeet

Reputation: 1499770

It's not clear which of your characters you're having problems with. So long as you use the XML API (instead of trying to write the XML out directly yourself) you should be fine with any valid text (broken surrogate pairs would probably cause an issue) but what won't be valid is Unicode code points less than space (U+0020), aside from tab, carriage return and line feed. They're simply not catered for in XML.

Upvotes: 3

Related Questions