pguetschow
pguetschow

Reputation: 5337

PostgreSQL to XML with 3 Tables

I'm a trainee in a small dev team and my project leader wants me to write a function that will export PostgreSQL data to a XML file. Unfortunately, I only know how to write an export to a csv.

There are 3 different Tables and he wants it like this (XML view)

<Table 1 Col1=".." Col2="..">
    <Table 2 Col1="...">
        <Table3 Col1=".." Col2="" Col3=".." Col4="..." />
        <Table3 Col1=".." Col2="" Col3=".." Col4="..." />
        <Table3 Col1=".." Col2="" Col3=".." Col4="..." />       
     </Table2>
    <Table1>
        <Table1>....</Table>
    </Table>
</Table2>
<Table1 Col1="xxx" Col2="xxx">
 ...

The tags are each for my table names. How do I write the code to export this?

Other questions on StackOverflow are mainly focused on exporting a single table, so I hope that this question will also help others trying to export multiple tables.

Upvotes: 2

Views: 2927

Answers (1)

Craig Ringer
Craig Ringer

Reputation: 324811

You have three levels of nested tables.

Sample data:

CREATE TABLE a(
  a_id integer primary key,
  name text
);

CREATE TABLE b(
  b_id integer primary key,
  a_id integer references a(a_id),
  val text
);

CREATE TABLE c(
  c_id serial primary key,
  b_id integer references b(b_id),
  blah text
);

INSERT INTO a(a_id, name) VALUES (1, 'fred'),(2, 'bert');

INSERT INTO b(b_id, a_id, val) VALUES 
(11, 1, 'x'), (12, 1, 'y'), (21, 2, 'a'), (22, 2, 'b');

INSERT INTO c(b_id, blah) VALUES
(11, 'whatever'), (11, 'gah'), (12, 'borkbork'), (22, 'fuzz');

Method 1: Do a left join, handle XML in the client

The simplest way to handle this is to do a left join over all three tables, ordered from outermost to innermost. Then you iterate down the result set, closing one element and opening another whenever the subject at that level changes.

select *
from a left join b on (a.a_id = b.a_id)
       left join c on (b.b_id = c.b_id)
order by a.a_id, b.b_id, c.c_id;

then loop over the rows returned, and for each row, pseudocode:

cur_row = get_new_row()

if (cur_row[b_id] != prev_row[b_id]) {
   emit_close_tableb();
}
if (cur_row[a_id] != prev_row[a_id]) {
   emit_close_tablea();
   emit_open_tablea(cur_row);
}
if (cur_row[b_id] != prev_row[b_id]) {
   emit_open_tableb(cur_row);
}
emit_tablec(cur_row);

prev_row = cur_row;

To write the XML you'd use something like XMLWriter. To read the query data you can use something like PDO or whatever driver you prefer. If the data set is large consider using a cursor to read the data.

This works well, but it transfers a lot of excess data, since you transfer n copies of the outer table's data for each n rows of the inner table associated with it.


To reduce the excess data exchanged you can select only the IDs for the outer tables

select a.a_id, b.b_id, c.*
from a left join b on (a.a_id = b.a_id)
       left join c on (b.b_id = c.b_id)
order by a.a_id, b.b_id, c.c_id;

... then when you switch to a new tablea / tableb, SELECT the rest of its rows then. You'll probably use a second connection to do this so you don't distrupt the result set and cursor state on the main connection you're reading rows from.

Method 2: Do it all in PostgreSQL

For smaller data sets, or for the inner levels of bigger data sets, you can use PostgreSQL's XML support to construct the XML documents, e.g.:

WITH xmlinput AS (
  SELECT a, b, c
  FROM a
  LEFT JOIN b ON (a.a_id = b.a_id)
  LEFT JOIN c on (b.b_id = c.b_id)
  ORDER BY a.a_id, b.b_id, c.c_id
)
SELECT
  XMLELEMENT(name items,
    xmlagg(
      XMLELEMENT(name a,
        XMLFOREST((a).a_id AS a_id, (a)."name" AS name),
        b_xml
      )
    ORDER BY (a).a_id)
  ) AS output
FROM
(
  SELECT
    a,
    xmlagg(
      XMLELEMENT(name b,
        XMLFOREST((b).b_id AS b_id, (b).val AS val),
        c_xml
      )
    ORDER BY (b).b_id)
    AS b_xml
  FROM
  (
    SELECT
      a, b,
      xmlagg(
        XMLELEMENT(name c,
          XMLFOREST((c).c_id AS c_id, (c).blah AS blah)
        )
      ORDER BY (c).c_id)
      AS c_xml
    FROM xmlinput
    GROUP BY a, b
  ) c_as_xml
  GROUP BY a
) b_as_xml;

... but really, you've got to be some kind of masochist to write code like that. Though it could prove to be fairly fast.

To understand the query you'll need to read the PostgreSQL XML docs. The wackly syntax was dreamed up by the SQL/XML committee, don't blame us.

Also note that row-variables are used heavily in the above code to keep it organized. a, b and c are passed as whole rows to outer layers of the query. This avoids the need to mess with aliases when names collide. The syntax (a).a_id, etc, means "the a_id field of the row-variable a". See the PostgreSQL manual for details.

The above uses a better XML structure (see comments below). If you want to emit attributes not elements, you can change the XMLFOREST calls to XMLATTRIBUTES calls.

Output:

<items><a><a_id>1</a_id><name>fred</name><b><b_id>11</b_id><val>x</val><c><c_id>1</c_id><blah>whatever</blah></c><c><c_id>2</c_id><blah>gah</blah></c></b><b><b_id>12</b_id><val>y</val><c><c_id>3</c_id><blah>borkbork</blah></c></b></a><a><a_id>2</a_id><name>bert</name><b><b_id>21</b_id><val>a</val><c/></b><b><b_id>22</b_id><val>b</val><c><c_id>4</c_id><blah>fuzz</blah></c></b></a></items>

or, pretty-printed:

<?xml version="1.0" encoding="utf-16"?>
<items>
    <a>
        <a_id>1</a_id>
        <name>fred</name>
        <b>
            <b_id>11</b_id>
            <val>x</val>
            <c>
                <c_id>1</c_id>
                <blah>whatever</blah>
            </c>
            <c>
                <c_id>2</c_id>
                <blah>gah</blah>
            </c>
        </b>
        <b>
            <b_id>12</b_id>
            <val>y</val>
            <c>
                <c_id>3</c_id>
                <blah>borkbork</blah>
            </c>
        </b>
    </a>
    <a>
        <a_id>2</a_id>
        <name>bert</name>
        <b>
            <b_id>21</b_id>
            <val>a</val>
            <c />
        </b>
        <b>
            <b_id>22</b_id>
            <val>b</val>
            <c>
                <c_id>4</c_id>
                <blah>fuzz</blah>
            </c>
        </b>
    </a>
</items>

Please emit better XML

On a side note, using attributes like that in XML seems tempting, but it quickly gets difficult and ugly to work with. Please just use normal XML elements:

  <Table 1>
    <Nr>1</Nr>
    <Name>blah</Name>
     <Table 2>
       <Nr>1</Nr>
       <Table 3>
          <Col1>42</Col1>
          <Col2>...</Col2>
          <Col3>...</Col3>
          <Col4>...</Col4>
          ...
       </Table 3>
     </Table 2>
   </Table 1>

Upvotes: 3

Related Questions