Scott
Scott

Reputation: 1253

How do i retrieve an XML entity value in C#?

I want to be able to display a list of entity names and values in a C#/.NET 4.0 application.

I am able to retrieve the entity names easily enough using XmlDocument.DocumentType.Entities, but is there a good way to retrieve the values of those entities?

I noticed that I can retrieve the value for text only entities using InnerText, but this doesn't work for entities that contain XML tags.

Is the best way to resort to a regex?

Let's say that I have a document like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document [
  <!ENTITY test "<para>only a test</para>">
  <!ENTITY wwwc "World Wide Web Corporation">
  <!ENTITY copy "&#xA9;">
]>

<document>
  <!-- The following image is the World Wide Web Corporation logo. -->
  <graphics image="logo" alternative="&wwwc; Logo"/>
</document>

I want to present a list to the user containing the three entity names (test, wwwc, and copy), along with their values (the text in quotes following the name). I had not thought through the question of entities nested within other entities, so I would be interested in a solution that either completely expands the entity values or shows the text just as it is in the quotes.

Upvotes: 3

Views: 4157

Answers (5)

erdomke
erdomke

Reputation: 5245

I ran into problems using the accepted solution. In particular:

  • In my document, the entity references needed a custom resolver to load them from external sources. Therefore, creating elements from the original document (and just not appending them) was an easier approach then trying to replicate the DTD and resolver in a new XmlDocument.
  • In addition, the InnerXml property kept returning the entity reference instead of its expansion. To work around this, I took the approach of copying the XML into an XElement which resolves the entity automatically.
private IEnumerable<KeyValuePair<string, string>> AllEntityExpansions(XmlDocument doc)
{
  var entities = doc.DocumentType.Entities;
  foreach (var entity in entities.OfType<XmlEntity>()
    .OrderBy(e => e.Name, StringComparer.OrdinalIgnoreCase))
  {
    var xmlString = default(string);
    try
    {
      var element = doc.CreateElement("e");
      element.AppendChild(doc.CreateEntityReference(entity.Name));
      using (var r = new XmlNodeReader(element))
      {
        var elem = XElement.Load(r);
        xmlString = elem.ToString();
      }
    }
    catch (XmlException) { }

    if (xmlString?.Length > 7)
      yield return new KeyValuePair<string, string>(entity.Name, xmlString.Substring(3, xmlString.Length - 7));
  }
}

Upvotes: 0

Scott
Scott

Reputation: 1253

Although it’s not likely the most elegant solution possible, I came up with something that seems to work well enough for my purposes. First, I parsed the original document and retrieved the entity nodes from that document. Then I created a small in-memory XML document, to which I added all the entity nodes. Next, I added entity references to all of the entities within the temporary XML. Finally, I retrieved the InnerXml from all of the references.

Here's some sample code:

        // parse the original document and retrieve its entities
        XmlDocument parsedXmlDocument = new XmlDocument();
        XmlUrlResolver resolver = new XmlUrlResolver();
        resolver.Credentials = CredentialCache.DefaultCredentials;
        parsedXmlDocument.XmlResolver = resolver;
        parsedXmlDocument.Load(path);

        // create a temporary xml document with all the entities and add references to them
        // the references can then be used to retrieve the value for each entity
        XmlDocument entitiesXmlDocument = new XmlDocument();
        XmlDeclaration dec = entitiesXmlDocument.CreateXmlDeclaration("1.0", null, null);
        entitiesXmlDocument.AppendChild(dec);
        XmlDocumentType newDocType = entitiesXmlDocument.CreateDocumentType(parsedXmlDocument.DocumentType.Name, parsedXmlDocument.DocumentType.PublicId, parsedXmlDocument.DocumentType.SystemId, parsedXmlDocument.DocumentType.InternalSubset);
        entitiesXmlDocument.AppendChild(newDocType);
        XmlElement root = entitiesXmlDocument.CreateElement("xmlEntitiesDoc");
        entitiesXmlDocument.AppendChild(root);
        XmlNamedNodeMap entitiesMap = entitiesXmlDocument.DocumentType.Entities;

        // build a dictionary of entity names and values
        Dictionary<string, string> entitiesDictionary = new Dictionary<string, string>();
        for (int i = 0; i < entitiesMap.Count; i++)
        {
            XmlElement entityElement = entitiesXmlDocument.CreateElement(entitiesMap.Item(i).Name);
            XmlEntityReference entityRefElement = entitiesXmlDocument.CreateEntityReference(entitiesMap.Item(i).Name);
            entityElement.AppendChild(entityRefElement);
            root.AppendChild(entityElement);
            if (!string.IsNullOrEmpty(entityElement.ChildNodes[0].InnerXml))
            {
                // do not add parameter entities or invalid entities
                // this can be determined by checking for an empty string
                entitiesDictionary.Add(entitiesMap.Item(i).Name, entityElement.ChildNodes[0].InnerXml);
            }
        }

Upvotes: 2

Josh
Josh

Reputation: 44916

You can easily display a representation of an XML document simply by walking the tree recursively.

This small class happens to use a Console, but you could easily modify it to your needs.

public static class XmlPrinter {
   private const Int32 SpacesPerIndent = 3;

   public static void Print(XDocument xDocument) {
      if (xDocument == null) {
         Console.WriteLine("No XML Document Provided");
         return;
      }

      PrintElementRecursive(xDocument.Root);
   }

   private static void PrintElementRecursive(XElement element, Int32 indentationLevel = 0) {
      if(element == null) return;

      PrintIndentation(indentationLevel);
      PrintElement(element);
      PrintNewline();

      foreach (var xAttribute in element.Attributes()) {
         PrintIndentation(indentationLevel + 1);
         PrintAttribute(xAttribute);
         PrintNewline();
      }

      foreach (var xElement in element.Elements()) {
         PrintElementRecursive(xElement, indentationLevel+1);
      }
   }

   private static void PrintAttribute(XAttribute xAttribute) {
      if (xAttribute == null) return;

      Console.Write("[{0}] = \"{1}\"", xAttribute.Name, xAttribute.Value);
   }

   private static void PrintElement(XElement element) {
      if (element == null) return;

      Console.Write("{0}", element.Name);

      if(!String.IsNullOrWhiteSpace(element.Value))
         Console.Write(" : {0}", element.Value);
   }

   private static void PrintIndentation(Int32 level) {
      Console.Write(new String(' ', level * SpacesPerIndent));
   }

   private static void PrintNewline() {
      Console.Write(Environment.NewLine);
   }
}

Using the class is trivial. Here is an example that prints out your current config file:

static void Main(string[] args) {
   XmlPrinter.Print(XDocument.Load(
      ConfigurationManager.OpenExeConfiguration(ConfigurationUserLevel.None).FilePath
                        ));

   Console.ReadKey();
}

Try it for yourself, and you should be able to quickly modify to get what you want.

Upvotes: 0

pgfearo
pgfearo

Reputation: 2256

This is one way (untested), it uses XMLReader and the ResolveEntity() method of this class:

private Dictionary<string, string> GetEntities(XmlReader xr)
{
    Dictionary<string, string> entityList = new Dictionary<string, string>();

    while (xr.Read())
    {
        HandleNode(xr, entityList);
    }
    return entityList;
}

StringBuilder sbEntityResolver;
int extElementIndex = 0;
int resolveEntityNestLevel = -1;
string dtdCurrentTopEntity = "";

private void HandleNode(XmlReader inReader, Dictionary<string, string> entityList)
{
    if (inReader.NodeType == XmlNodeType.Element)
    {
        if (resolveEntityNestLevel < 0)
        {
                while (inReader.MoveToNextAttribute())
                {
                    HandleNode(inReader, entityList); // for namespaces
                    while (inReader.ReadAttributeValue())
                    {
                        HandleNode(inReader, entityList); // recursive for resolving entity refs in attributes
                    }                       
                }
        }
        else
        {
            extElementIndex++;
            sbEntityResolver.Append(inReader.ReadOuterXml());
            resolveEntityNestLevel--;
            if (!entityList.ContainsKey(dtdCurrentTopEntity))
            {
                entityList.Add(dtdCurrentTopEntity, sbEntityResolver.ToString());
            }
        }
    }
    else if (inReader.NodeType == XmlNodeType.EntityReference)
    {
        if (inReader.Name[0] != '#' && !entityList.ContainsKey(inReader.Name))
        {
            if (resolveEntityNestLevel < 0)
            {
                sbEntityResolver = new StringBuilder(); // start building entity
                dtdCurrentTopEntity = inReader.Name;
            }
            // entityReference can have contents that contains other
            // entityReferences, so keep track of nest level
            resolveEntityNestLevel++;
            inReader.ResolveEntity();
        }
    }
    else if (inReader.NodeType == XmlNodeType.EndEntity)
    {
        resolveEntityNestLevel--;
        if (resolveEntityNestLevel < 0)
        {
            if (!entityList.ContainsKey(dtdCurrentTopEntity))
            {
                entityList.Add(dtdCurrentTopEntity, sbEntityResolver.ToString());
            }
        }
    }
    else if (inReader.NodeType == XmlNodeType.Text)
    {
        if (resolveEntityNestLevel > -1)
        {
            sbEntityResolver.Append(inReader.Value);
        }
    }
}

Upvotes: 1

Adam Jones
Adam Jones

Reputation: 721

If you have an XmlDocument object, perhaps it would be easier to recursively step through each XmlNode object (from XmlDocument.ChildNodes), and for each node you can use the Name property to get the name of the node. Then "getting the value" depends on what you want (InnerXml for a string representation, ChildNodes for programmatic access to the XmlNode objects which can be cast to XmlEntity/XmlAttribute/XmlText).

Upvotes: 0

Related Questions