TomD
TomD

Reputation: 13

Querying a CLOB column containing an XML-message

I'm trying to datamine a table. The table contains a column where xml-messages are stored, using the CLOB datatype. The XML-message has the following structure:

<t:MsgNameXML typeInfo="source" xsi:schemaLocation="http://example/location FileName.xsd" xmlns:t="http://url/of/custom/namespace" xmlns:"http://url/of/default/namespace"      >
    <t:info1>data1</t:info1>
    <t:info2>data2</t:info2>
    <t:Section>
        <t:SubSection>
            <t:SubSubSection>
                <t:variable1>data3</t:variable1>
                <t:variable2>data4</t:variable2>
            </t:SubSubSection>
        </t:SubSection>
    </t:Section>
    <t:Section>
        <t:SubSection>
            <t:SubSubSection>
                <t:variable1>data5</t:variable1>
                <t:variable2>data6</t:variable2>
            </t:SubSubSection>
        </t:SubSection>
    </t:Section>
</t:MsgNameXML>

So, at least for me, the main difficulties are the namespaces and the fact that there a multiple nodes.

I hope to get the following output (for the record containing the above XML):

INFO1   INFO2   VARIABLE1   VARIABLE2
-------------------------------------
data1   data2   data3       data4
data1   data2   data5       data6

I've constructed the following query to try to extract "variable1" (Oracle database, so PL SQL, I guess):

select 
extractvalue(xmltype(CLOB_COLUMN_NAME),
'/t:MsgNameXML/t:Section/t:SubSection/t:SubSubSection/t:variable1',
'xmlns:t="xmlns:t="http://url/of/custom/namespace" xmlns:"http://url/of/default/namespace"')

from 
TABLE_NAME,
TABLE(XMLSEQUENCE(EXTRACT(CLOB_COLUMN_NAME, '/t:MsgNameXML/t:Section/t:SubSection/t:SubSubSection/t:variable1')))

This unfortunately produces the following error message:

ORA-00932: inconsistent datatypes: expected - got -
00932. 00000 -  "inconsistent datatypes: expected %s got %s"
*Cause:    
*Action:
Error at Line: 9 Column: 33

(Line 9, column 33, refers to where CLOB_COLUMN_NAME is located after the EXTRACT-function in the from-clause).

Does anyone know what I'm doing wrong, and how I could correct this?

In addition, I'd need to expand this query to extract more values from the XML-message, rather than just the "variable1" (see desired result above). Any help on that would be much appreciated as well.

Thank you for your help,

Tom

Upvotes: 1

Views: 4763

Answers (1)

Alex Poole
Alex Poole

Reputation: 191275

The immediate cause of your ORA-00932 is that you're trying to extract directly from a CLOB; you're converting on one place already, but this:

TABLE(XMLSEQUENCE(EXTRACT(CLOB_COLUMN_NAME, ... )))

... would need to be:

TABLE(XMLSEQUENCE(EXTRACT(XMLType(CLOB_COLUMN_NAME), ... )))

But the namespace still isn't recognised in that clause, and you'd get ORA-31011 if you only change that. You can get something out by including the namespace in the EXTRACT call as well, and correcting the typo in that:

select extractvalue(column_value, '/t:variable1',
  'xmlns:t="http://url/of/custom/namespace" xmlns:"http://url/of/default/namespace"')
  as variable1
from 
TABLE_NAME,
TABLE(XMLSEQUENCE(EXTRACT(XMLType(CLOB_COLUMN_NAME),
  '/t:MsgNameXML/t:Section/t:SubSection/t:SubSubSection/t:variable1',
  'xmlns:t="http://url/of/custom/namespace" xmlns:"http://url/of/default/namespace"')));

VARIABLE1
----------
data3      
data5      

Or both variables at once by shifting the extract back a level:

select extractvalue(column_value, 't:SubSubSection/t:variable1',
  'xmlns:t="http://url/of/custom/namespace" xmlns:"http://url/of/default/namespace"')
  as variable1,
  extractvalue(column_value, 't:SubSubSection/t:variable2',
  'xmlns:t="http://url/of/custom/namespace" xmlns:"http://url/of/default/namespace"')
  as variable2
from 
TABLE_NAME,
TABLE(XMLSEQUENCE(EXTRACT(XMLType(CLOB_COLUMN_NAME),
  '/t:MsgNameXML/t:Section/t:SubSection/t:SubSubSection',
  'xmlns:t="http://url/of/custom/namespace" xmlns:"http://url/of/default/namespace"')));

VARIABLE1  VARIABLE2
---------- ----------
data3      data4      
data5      data6      

Including the info values as well is going to make it more complicated and repetitive:

select extractvalue(XMLType(CLOB_COLUMN_NAME), 't:MsgNameXML/t:info1',
  'xmlns:t="http://url/of/custom/namespace" xmlns:"http://url/of/default/namespace"')
  as info1,
  extractvalue(XMLType(CLOB_COLUMN_NAME), 't:MsgNameXML/t:info2',
  'xmlns:t="http://url/of/custom/namespace" xmlns:"http://url/of/default/namespace"')
  as info2,
  extractvalue(column_value, 't:SubSubSection/t:variable1',
  'xmlns:t="http://url/of/custom/namespace" xmlns:"http://url/of/default/namespace"')
  as variable1,
  extractvalue(column_value, 't:SubSubSection/t:variable2',
  'xmlns:t="http://url/of/custom/namespace" xmlns:"http://url/of/default/namespace"')
  as variable2
from 
TABLE_NAME,
TABLE(XMLSEQUENCE(EXTRACT(XMLType(CLOB_COLUMN_NAME),
  '/t:MsgNameXML/t:Section/t:SubSection/t:SubSubSection',
  'xmlns:t="http://url/of/custom/namespace" xmlns:"http://url/of/default/namespace"')));

INFO1      INFO2      VARIABLE1  VARIABLE2
---------- ---------- ---------- ----------
data1      data2      data3      data4      
data1      data2      data5      data6      

SQL Fiddle of this approach.


You can use XQuery and XMLTable instead of the older extract syntax:

select info1, info2, variable1, variable2
from table_name,
  xmltable('declare namespace t = "http://url/of/custom/namespace"; (: :)
    for $i in /t:MsgNameXML
      let $info1 := $i/t:info1,
        $info2 := $i/t:info2
      for $j in $i/t:Section/t:SubSection/t:SubSubSection
        let $variable1 := $j/t:variable1,
          $variable2 := $j/t:variable2
      return <tmp>
          <info1>{$info1}</info1>
          <info2>{$info2}</info2>
          <variable1>{$variable1}</variable1>
          <variable2>{$variable2}</variable2>
        </tmp>'
passing xmltype(CLOB_COLUMN_NAME)
columns
  info1     char(10) path '/tmp/info1',
  info2     char(10) path '/tmp/info2',
  variable1 char(10) path '/tmp/variable1',
  variable2 char(10) path '/tmp/variable2'
);

INFO1      INFO2      VARIABLE1  VARIABLE2
---------- ---------- ---------- ----------
data1      data2      data3      data4      
data1      data2      data5      data6      

SQL Fiddle.

I had to tweak the CLOB value a little to make it valid; changing xsi= to xmlns:xsi= and setting a dummy name for the default namespace, xmlns:x="http://url/of/default/namespace". I think that was all...

XQuery is a "functional language based on sequences", and this uses "FLOWR" expressions. Specifically two for loops to iterate over the structure, and let expressions to assign the namespace variables to local variables which are then returned in a simpler XML structure, from which the columns can be extracted via their new paths. (You don't seem to be able to use namespaces directly in the path strings).

It should be fairly easy to expand this to get more values out of your message - more let expressions, and more matching path extractions.

But I'm really just scratching the surface of this stuff myself, so there may be simpler/better/faster ways to do this; hopefully a starting point for an XQuery approach, anyway.

Upvotes: 1

Related Questions