Reputation: 1553
I need to parse an HTML table:
<table>
<tbody>
<tr class="blue"><td>code</td><td>rate</td></tr>
<tr class="gray_1"><td><span>USD</span><em>1</em></td><td>476.16</td></tr>
<tr class="gray_2"><td><span>AUD</span><em>1</em></td><td>327.65</td></tr>
<tr class="gray_9"><td><span>IRR</span><em>100</em></td><td>1.13</td></tr>
<tr class="blue"><td>some comment</td><td>some comment</td></tr>
<tr class="gray_1"><td><span>EUR</span><em>1</em></td><td>526.54</td></tr>
</tbody>
</table>
and as a result to get:
+------+----+----+------+
|class |code|em |value |
+------+----+----+------+
|gray_1|USD | 1| 11.11|
|gray_2|AUD | 1| 22.22|
|gray_9|IRR | 100| 33.33|
|gray_1|EUR | 1| 44.44|
I tried to do it like this:
with tbl as
(
select xmltype('
<table>
<tbody>
<tr class="blue"><td>code</td><td>rate</td></tr>
<tr class="gray_1"><td><span>USD</span><em>1</em></td><td>476.16</td></tr>
<tr class="gray_2"><td><span>AUD</span><em>1</em></td><td>327.65</td></tr>
<tr class="gray_9"><td><span>IRR</span><em>100</em></td><td>1.13</td></tr>
<tr class="blue"><td>some comment</td><td>some comment</td></tr>
<tr class="gray_1"><td><span>EUR</span><em>1</em></td><td>526.54</td></tr>
</tbody>
</table>
') xml_data from dual
)
select
*
from
tbl,
xmltable('//table/tbody/tr'
passing tbl.xml_data
columns
data varchar2(128) path './td'
)
But this causes an exception:
ORA-19279: XPTY0004 - XQuery dynamic type mismatch: expected singleton sequence - got multi-item sequence
What do I need to do to fix my code?
Upvotes: 0
Views: 3184
Reputation: 191570
Your path
is looking for a td
under the tr
; but there are two, hence the "got multi-item sequence" error you're seeing. You can reference each td
tag by its position, as td[1]
etc. It's very reliant on the table structure being as expected though.
With this specific example you can do:
with tbl as
(
select xmltype('
<table>
<tbody>
<tr class="blue"><td>code</td><td>rate</td></tr>
<tr class="gray_1"><td><span>USD</span><em>1</em></td><td>476.16</td></tr>
<tr class="gray_2"><td><span>AUD</span><em>1</em></td><td>327.65</td></tr>
<tr class="gray_9"><td><span>IRR</span><em>100</em></td><td>1.13</td></tr>
<tr class="blue"><td>some comment</td><td>some comment</td></tr>
<tr class="gray_1"><td><span>EUR</span><em>1</em></td><td>526.54</td></tr>
</tbody>
</table>
') xml_data from dual
)
select
x.class, x.currency, x.amount, to_number(x.rate) as rate
from
tbl
cross join
xmltable('/table/tbody/tr'
passing tbl.xml_data
columns
class varchar2(10) path '@class',
currency varchar2(3) path 'td[1]/span',
amount number path 'td[1]/em',
rate varchar2(50) path 'td[2]'
) x
where
x.currency is not null
which gets:
CLASS CUR AMOUNT RATE
---------- --- ---------- ----------
gray_1 USD 1 476.16
gray_2 AUD 1 327.65
gray_9 IRR 100 1.13
gray_1 EUR 1 526.54
It won't take much variation in the HTML to break it though. See this answer for some reasons it is fragile, and why it generally considered unwise to try to parse HTML as XML.
Upvotes: 2