Reputation: 1493
In the xml file below, the element "book" and its child "bookid" are different from the elements "toc-entry": "book" and "bookid" begin with <book>
and <bookid>
respectively, and terminates with </book>
and </bookid>
respectively, while "toc-entry" elements begin with <toc-entry
and end with either />
or >
depending if they have a child or not.
My question is: why is there such difference ?
<?xml version="1.0" encoding="UTF-8"?>
<bs-submission participant-id="0"
run-id="GROUNDTRUTH"
task="book-toc"
toc-creation="semi-automatic"
toc-source="full-content">
<source-files xml="no" pdf="no" />
<description>
This file contains the annotated groundtruth file (ideal ToCs) for the 2013 ICDAR Book Structure Extraction competition.
</description>
<book>
<bookid>6AD91AD5A04A7129</bookid>
<toc-entry title="DEDICATION." page="7"/>
<toc-entry title="HISTORICAL CATECHISM. CHIEFLY RELATING TO THE ENGLISH PROVINCE OF THE SOCIETY." page="9"/>
<toc-entry title="Collections, Illustrating the Biography, &c." page="19">
<toc-entry title="SCOTCH MEMBERS, S. J." page="19"/>
</toc-entry>
<toc-entry title="Collections, Illustrating the Biography, &c." page="44">
<toc-entry title="ENGLISH MEMBERS, S. J." page="44"/>
</toc-entry>
<toc-entry title="Collections, Illustrating the Biography, &c." page="231">
<toc-entry title="IRISH MEMBERS, S. J." page="231"/>
</toc-entry>
<toc-entry title="REMARKS ON THE CASE OF THE JESUITS. 1829." page="271"/>
</book>
Upvotes: 0
Views: 51
Reputation: 52858
Like you hinted towards in your question, the difference is that some elements have children and some do not (they're empty).
Empty elements can either end with />
or an end tag ('</' Name S? '>'
).
Empty elements that end with />
are sometimes referred to as self-closing.
<toc-entry title="some title" page="1"/>
is the same thing as:
<toc-entry title="some title" page="1"></toc-entry>
From the spec:
Empty-element tags may be used for any element which has no content, whether or not it is declared using the keyword EMPTY. For interoperability, the empty-element tag should be used, and should only be used, for elements which are declared EMPTY.
Upvotes: 1
Reputation: 2490
In XML, it is mandatory that each element is opened and later closed. An element contains everything that's between its opening tag and its closing tag.
To achieve that there are three kinds of element markup tags:
<book>
</
, such as </book>
<
and end with />
, such as <entry name="stuff"/>
Opening tags and closing tags, as their names indicate, open and close elements.
Self closing tags are a shorthand that do both at once. In XML syntax they are 100% equivalent to opening the tag and closing it immediately after, so writing <entry name="stuff"/>
is exactly the same as <entry name="stuff"></entry>
Upvotes: 2