tjxob
tjxob

Reputation: 19

Extract HTML content from XML in Oracle

I am migrating an application from Sedna to Oracle. I have a collection/table that contents HTML code.

<descriptions>
  <description acad-plan="SAGSY1">
    <p>
      <strong>test?</strong>
    </p>
    <p>test</p>
  </description>
</descriptions>

My goal is to extract the HTML content of <description> tag. When I run the following query, it returns the content but it removes the HTML tags.

SELECT xmllines.* FROM
XMLTABLE ( '/descriptions'
    PASSING xmltype('<descriptions> 
<description acad-plan="SAGSY1">
<p>
  <strong>test?</strong>
</p>
<p>test</p>
</description>
</descriptions>'
) COLUMNS
        id VARCHAR2(512) PATH 'description'
) xmllines;

The output of this query is :

test?test

What I am looking for is

<p><strong>test </strong> </p> <p>test</p>

Any idea how to fix this?

Upvotes: 0

Views: 642

Answers (1)

Alex Poole
Alex Poole

Reputation: 191570

You appear to want one of these:

select html
from xmltable('/'
  passing xmltype('<descriptions>
  <description acad-plan="SAGSY1">
    <p>
      <strong>test?</strong>
    </p>
    <p>test</p>
  </description>
</descriptions>')
  columns html xmltype path '/descriptions/description/*');

select html
from xmltable('/descriptions'
  passing xmltype('<descriptions>
  <description acad-plan="SAGSY1">
    <p>
      <strong>test?</strong>
    </p>
    <p>test</p>
  </description>
</descriptions>')
  columns html xmltype path 'description/*'
);

select html
from xmltable('/descriptions/description'
  passing xmltype('<descriptions>
  <description acad-plan="SAGSY1">
    <p>
      <strong>test?</strong>
    </p>
    <p>test</p>
  </description>
</descriptions>')
  columns html xmltype path '*'
);

depending on the number and type of repeated nodes; if any; or if there is only one node to extract you coudl use an XMLQuery instead:

select xmlquery('/descriptions/description/*'
  passing xmltype('<descriptions>
  <description acad-plan="SAGSY1">
    <p>
      <strong>test?</strong>
    </p>
    <p>test</p>
  </description>
</descriptions>')
  returning content) as html
from dual;

All of those get the same output:

HTML                                                                            
--------------------------------------------------------------------------------
<p><strong>test?</strong></p><p>test</p>

I've changed the path to get * under the description node; and changed the returned column type from the XMLTable calls to XMLType. The XMLQuery returns that type too.

If you want the result as a plain string you can use the XMLType getStringVal() function; like this for the XMLTable versions:

select xmltype.getStringVal(html) as html
from xmltable(...)

or like this for the XMLQuery version:

select xmlquery('/descriptions/description/*'
  passing xmltype('<descriptions>
  <description acad-plan="SAGSY1">
    <p>
      <strong>test?</strong>
    </p>
    <p>test</p>
  </description>
</descriptions>')
  returning content).getStringVal() as html
from dual;

Upvotes: 2

Related Questions