Reputation: 595
I try to parse an HTML page using BaseX. From this part of code:
<td colspan="2" rowspan="1" class="light comment2 last2">
<img class="textalign10" src="templates/comment10.png"
alt="*" width="10" height="10" border="0"/>
<a shape="rect" href="mypage.php?userid=26682">user</a>
: the text I'd like to keep [<a shape="rect"
href="http://alink" rel="nofollow">Link</a>] . with that part too.
</td>
I need to extract the message with the a
HTML link, and remove the first :
characters at the beginning.
I would like to obtain this exact text:
<message>
the text I'd like to keep [<a shape="rect" href="http://alink" rel="nofollow">Link</a>] . with that part too.
</message>
Using this function,
declare
function gkm:node_message_from_comment($comment as item()*) {
if ($comment) then
copy $c := $comment
modify (
delete node $c/img[1],
delete node $c/a[1],
delete node $c/@*,
rename node $c as 'message'
)
return $c
else ()
};
I can extract the text, but I failed to remove the :
from the begining.
ie:
<message>
: the text I'd like to keep [<a shape="rect" href="http://alink" rel="nofollow">Link</a>] . with that part too.
</message>
Upvotes: 2
Views: 114
Reputation: 38702
Using XQuery Update and transformation statements seems a little bit overcomplicated to me. You can also select the nodes following the mypage.php
link; with more knowledge on the input, there might also be better ways to select the required nodes.
To cut of the :
substring, use substring-after
. The pattern "cut off :
from the first result node, and return all others as is" is also applicable when using transform statements, if you insist on using them.
let $comment :=<td colspan="2" rowspan="1" class="light comment2 last2">
<img class="textalign10" src="templates/comment10.png" alt="*" width="10" height="10" border="0"/>
<a shape="rect" href="mypage.php?userid=26682">user</a>
: the text I'd like to keep [<a shape="rect" href="http://alink" rel="nofollow">Link</a>] . with that part too.
</td>
let $result := $comment/a[starts-with(@href, 'mypage.php')]/following-sibling::node()
return <message>{
$result[1]/substring-after(., ': '),
$result[position() > 1]
}</message>
As BaseX supports XQuery 3.0, you could also take advantage of the helper functions head
and tail
:
return <message>{
head($result)/substring-after(., ': '),
tail($result)
}</message>
Upvotes: 3