Reputation: 141
I am a beginner in SQL and need to answer the following question. I have thousands of XML files (each has hundreds of nodes) and need to build a Postgresql database above the data inside them.
I consider two ways:
Which approach could be better (faster, more comfortable)? What are the advantages and disadvantages of using XML type inside SQL database?
Upvotes: 3
Views: 2003
Reputation: 324821
Option (1) is terrible, don't do that. A single giant table with one row per node will be awful to query, like EAV with added sulphurous stink and bonus horns.
Either model the data represented by the XML as entities (tables) and relationships, or just store the XML documents in the DB.
You can only usefully model the XML as entities and relationships if the XML is regularly structured. If you have a bunch of different free-form XML files then you can't really model it usefully in an RDBMS. If they are regular this is often the best option, eg:
<root>
<parentnode attrib="value">
<child otherattrib="value2">content</child>
</parentnode>
<...>
</root>
you could model this as:
parent
table with an id
and an attrib
column; andchild
table with an id
and otherattrib
column as well as a parent_id
column that has a foreign key reference to parent(id)
.Exactly how to model your XML depends on the XML. What's mandatory and what isn't? Do you need to reconstruct the exact ordering of the entities in the input XML or is order within a node insignificant? Are there free-form nestable entities?
As an example of just one kind of decision, if you have a parent node that can have either zero or one (but no more) of a given child node type, you could choose to model that with two tables and a 1:1 optional relationship or you could merge the child element into the parent in a single table where the child attributes/content are nullable. There are advantages and disadvantages to both in terms of performance (join cost vs table width and rows-per-page) and ease of use.
If the XML structure is strict it's often convenient to model as tables; if it's free form it's usually more useful to store it as XML in the DB and query it with xpath.
If you keep it as XML documents it's harder to index and query within the DB but much easier to get the XML out of the DB and to apps. Functional indexes on xpath expressions can help a lot. It's also worth adding a CHECK
constraint to enforce that the xml
field IS DOCUMENT
so you can't store XML fragments, only whole documents.
Upvotes: 6