Luka Zdravkovic
Luka Zdravkovic

Reputation: 9

Write in an XML document with C# using special characters

I'm trying to write an XML file in C# using XmlWriter in a recursive function. The file is supposed to contain every single folder in a given directory as well as every subfolder and file.

I'm having some trouble in trying to write special characters in the XML file, it's constantly giving me the error that

I can't use characters such as '&', '/', '-', '.', ' ' etc.

And even numbers aren't working. I have tried finding similar questions to this problem and no solution helped me, I have tried replacing the folder and/or file string name that consists of special characters and escaping them using "&amp ;", "&quot ;", "&apos ;" etc. But that isn't working either. It just gives me an error that I can't use '&'.

    using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO;
using System.Xml;

namespace XMLgenerator
{
    public class Generator
    {
        public void write(string Dir, XmlWriter writer)
        {
            try
            {
                writer.WriteStartElement("Folders");
                 foreach (string s in Directory.GetDirectories(Dir))
              {
                    string[] splitter = s.Split('\\');
                    string ss = splitter[splitter.Length - 1];
                    string ssxml = XmlConvert.EncodeLocalName(ss);
                        writer.WriteStartElement("Folder");
                    writer.WriteAttributeString("name", ssxml);

                    foreach (string f in Directory.GetFiles(s))
                    {
                        string fxml = XmlConvert.EncodeLocalName(f);
                        FileInfo fi = new FileInfo(f);
                        long length =  fi.Length;
                        writer.WriteElementString(fxml, length.ToString());
                    }
                    writer.WriteEndElement();
                    write(s,writer);
                }
                writer.WriteEndElement();
            }
            catch (UnauthorizedAccessException ex)
            {
                Console.WriteLine(ex.Message);
                return;
            }
            catch (IOException ex)
            {
                Console.WriteLine(ex.Message);
                return;
            }
        }
        // Method for creating an XML file and also getting directories and files. File name and dir path are parametres
        public void generateContent(string Dir)
            {
            XmlWriterSettings xws = new XmlWriterSettings();
            xws.Encoding = new UTF8Encoding();
            using (XmlWriter writer = XmlWriter.Create("test.xml", xws))
            {
                writer.WriteStartDocument();
                write(Dir,writer);
                writer.WriteEndDocument();
            }

            }
        }
    }

Upvotes: 1

Views: 5111

Answers (2)

dbc
dbc

Reputation: 116533

You are trying to include '&', '/', '-', '.', ' ' and whatnot in an XML element name. Some of these such as '&' cannot be included in an element name at all while others such as '-' and digits can be included -- just not as the first character. The XML Standard 4th edition (which is the version currently supported by XmlWriter) defines the valid characters in a name as follows:

[4]     NameChar    ::=     Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar | Extender
[5]     Name        ::=     (Letter | '_' | ':') (NameChar)*

Where Letter, Digit et. al. are defined here. Note that a letter must be first.

Since your ss string might include invalid characters, you can use XmlConvert.EncodeLocalName() to escape as required, then later use XmlConvert.DecodeName() to recover the original string when reading the XML.

Thus, your code would look something like:

    public void write(string Dir, XmlWriter writer)
    {
        try
        {
            writer.WriteStartElement("Folders");
            foreach (string directoryPath in Directory.GetDirectories(Dir))
            {
                string directoryName = Path.GetFileName(directoryPath);
                writer.WriteStartElement(XmlConvert.EncodeLocalName(directoryName));
                foreach (string fileName in Directory.GetFiles(directoryPath))
                {
                    FileInfo fi = new FileInfo(fileName);
                    writer.WriteElementString(XmlConvert.EncodeLocalName(fileName), XmlConvert.ToString(fi.Length));
                }
                writer.WriteEndElement();
                write(directoryPath, writer);
            }
            writer.WriteEndElement();
        }
        catch (UnauthorizedAccessException ex)
        {
            Console.WriteLine(ex.Message);
            return;
        }
        catch (IOException ex)
        {
            Console.WriteLine(ex.Message);
            return;
        }

However, I would suggest an alternate approach of using fixed element names, as recommended by @PaulAbbott, which stores directory and file names as attribute values:

    public void write(string Dir, XmlWriter writer)
    {
        try
        {
            writer.WriteStartElement("Folders");
            foreach (string directoryPath in Directory.GetDirectories(Dir))
            {
                string directoryName = Path.GetFileName(directoryPath);
                writer.WriteStartElement("Folder");
                writer.WriteAttributeString("Name", directoryName);
                foreach (string fileName in Directory.GetFiles(directoryPath))
                {
                    FileInfo fi = new FileInfo(fileName);
                    writer.WriteStartElement("File");
                    writer.WriteAttributeString("Name", fileName);
                    writer.WriteValue(fi.Length);
                    writer.WriteEndElement();
                }
                write(directoryPath, writer); // I moved this inside the outer <Folder> tag.
                writer.WriteEndElement();
            }
            writer.WriteEndElement();
        }
        catch (UnauthorizedAccessException ex)
        {
            Console.WriteLine(ex.Message);
            return;
        }
        catch (IOException ex)
        {
            Console.WriteLine(ex.Message);
            return;
        }
    }

This should produce more readable XML, such as:

<Folders>
  <Folder Name="WpfApplication1">
    <File Name="D:\Temp\Question27864746 XMLapp\WpfApplication1\WpfApplication1.sln">1014</File>
    <File Name="D:\Temp\Question27864746 XMLapp\WpfApplication1\WpfApplication1.v12.suo">84992</File>
    <Folders>
      <Folder Name="WpfApplication1">
        <File Name="D:\Temp\Question27864746 XMLapp\WpfApplication1\WpfApplication1\App.config">187</File>
        <File Name="D:\Temp\Question27864746 XMLapp\WpfApplication1\WpfApplication1\App.xaml">326</File>
      </Folder>
    </Folders>
  </Folder>
</Folders>

Upvotes: 1

L.B
L.B

Reputation: 116098

Instead of trying to fix your xml, use Linq2Xml to achive the similar thing.

I would do it as (no string manuplation, no special char handling)

XElement Dir2Xml(string dir)
{
    var dInfo = new DirectoryInfo(dir);
    var files = new XElement("files");

    foreach(var f in dInfo.GetFiles())
    {
        files.Add(new XElement("file", f.FullName)); //or use "f.Name" whichever you like
    }

    foreach (var d in dInfo.GetDirectories())
    {
        files.Add(new XElement("directory", new XAttribute("name", d.Name), Dir2Xml(d.FullName)));
    }

    return files;
}

var xmlstring = Dir2Xml(@"c:\temp").ToString();

Upvotes: 1

Related Questions