Marek Hajžman
Marek Hajžman

Reputation: 141

Advantage of Postgresql XML type?

I am a beginner in SQL and need to answer the following question. I have thousands of XML files (each has hundreds of nodes) and need to build a Postgresql database above the data inside them.

I consider two ways:

  1. to convert XML files into one (or more) large database table (one XML node = one line in SQL table) and work with this table.
  2. to create a database with native XML type (store XML type data inside database) and for searching and filtering use XPath...

Which approach could be better (faster, more comfortable)? What are the advantages and disadvantages of using XML type inside SQL database?

Upvotes: 3

Views: 2003

Answers (1)

Craig Ringer
Craig Ringer

Reputation: 324821

Option (1) is terrible, don't do that. A single giant table with one row per node will be awful to query, like EAV with added sulphurous stink and bonus horns.

Either model the data represented by the XML as entities (tables) and relationships, or just store the XML documents in the DB.

You can only usefully model the XML as entities and relationships if the XML is regularly structured. If you have a bunch of different free-form XML files then you can't really model it usefully in an RDBMS. If they are regular this is often the best option, eg:

<root>
   <parentnode attrib="value">
      <child otherattrib="value2">content</child>
   </parentnode>
   <...>
</root>

you could model this as:

  • a parent table with an id and an attrib column; and
  • a child table with an id and otherattrib column as well as a parent_id column that has a foreign key reference to parent(id).

Exactly how to model your XML depends on the XML. What's mandatory and what isn't? Do you need to reconstruct the exact ordering of the entities in the input XML or is order within a node insignificant? Are there free-form nestable entities?

As an example of just one kind of decision, if you have a parent node that can have either zero or one (but no more) of a given child node type, you could choose to model that with two tables and a 1:1 optional relationship or you could merge the child element into the parent in a single table where the child attributes/content are nullable. There are advantages and disadvantages to both in terms of performance (join cost vs table width and rows-per-page) and ease of use.

If the XML structure is strict it's often convenient to model as tables; if it's free form it's usually more useful to store it as XML in the DB and query it with xpath.

If you keep it as XML documents it's harder to index and query within the DB but much easier to get the XML out of the DB and to apps. Functional indexes on xpath expressions can help a lot. It's also worth adding a CHECK constraint to enforce that the xml field IS DOCUMENT so you can't store XML fragments, only whole documents.

Upvotes: 6

Related Questions