Inconsistency between XmlReader.Read() and XmlReader.ReadStartElement()

Question

Trying to understand an "inconsistency" between XmlReader.Read() and XmlReader.ReadStartElement(). In reader1 below, everything is expected, namely, it needs 3 reading to read the whole xml; more importantly when reading for the 1st time i.e. reading , reader1.Value is empty. And 2nd reading reader1.Value is the text node value.

But in reader2, I am expecting the same reading order because as far as I know ReadStartElement() internally calls Read() and it should only read one XmlNodeType e.g. here the . It's almost as if we can replace ReadStartElement("firstname") with a call to check whether it's a start element with name firstname and a call to Read(). Why is reader2.Value not empty right after ReadStartElement("firstname") ? I asked this initially under a question of @lesscode and his explanation is, according to msdn, ReadStartElement() will advance the XmlReader to the next node and reader.Value is the value of current node. But if so, isn't this inconsistent between Read() and ReadStartElement() because using Read() you have to retrieve the Value afterwards whereas using ReadStartElement() you have to retrieve the Value beforehand.

 var simpleElement = "Jim";
 using (var reader1 = XmlReader.Create(new StringReader(simpleElement)))
 {
    var i = 1;
    while (reader1.Read())
    {
       WriteLine($"i = {i++}; value = {reader1.Value}");
    }           
 }

using (var reader2 = XmlReader.Create(new StringReader(simpleElement)))
{
 // this internally calls Read() which should have ONLY read the 'firstname' start element node. 
    reader2.ReadStartElement("firstname"); 

// prints Jim; but why??? The text node has NOT been read yet!
    WriteLine(reader2.Value); 

    reader2.Read(); //WHY needs this line given text node has been read already?
    reader2.ReadEndElement(); 

}

Arnaud Develay · Accepted Answer

You can check at the source code on Github: XmlReader.cs.

As you can see below, the methods have not the same behavior:

// Checks that the current node is an element and advances the reader to the next node.
public virtual void ReadStartElement() {
    if (MoveToContent() != XmlNodeType.Element) {
        throw new XmlException(Res.Xml_InvalidNodeType, this.NodeType.ToString(), this as IXmlLineInfo);
    }
    Read();
}

// Checks whether the current node is a content (non-whitespace text, CDATA, Element, EndElement, EntityReference
// or EndEntity) node. If the node is not a content node, then the method skips ahead to the next content node or 
// end of file. Skips over nodes of type ProcessingInstruction, DocumentType, Comment, Whitespace and SignificantWhitespace.
public virtual  XmlNodeType  MoveToContent() {
    do {
        switch (this.NodeType) {
            case XmlNodeType.Attribute:
                MoveToElement();
                goto case XmlNodeType.Element;
            case XmlNodeType.Element:
            case XmlNodeType.EndElement:
            case XmlNodeType.CDATA:
            case XmlNodeType.Text:
            case XmlNodeType.EntityReference:
            case XmlNodeType.EndEntity:
                return this.NodeType;
        }
    } while (Read());
    return this.NodeType;
}

So the ReadStartElement method, calls MoveToContent wich can make several Read calls in order to find a content node. After this, ReadStartElement reads the current starting element.

Inconsistency between XmlReader.Read() and XmlReader.ReadStartElement()

Answers (1)

Related Questions