Dharmendra Kumar Singh
Dharmendra Kumar Singh

Reputation: 195

Replace text node with element if text match in the document BaseX

I am trying to replace text node with element if document matches that text, below query I have tried but it is giving error "Target is not an element, text, attribute, comment or pi" below is my query.

inputXML:

<book>
<p>Isn't it lovely here? Very smart. We'll be like three queens when you've finished with us,
    Edie. You doing well then?</p>
<p>
    <name type="person">April De Angelis</name>’ plays include <title type="work">Positive
        Hour</title> (Out of Joint) <title type="work">Playhouse Creatures</title> (<name
        type="org">Sphinx Theatre Company</name>), <title type="work">Hush</title> (<name
        type="org">Royal Court</name>), <title type="work">Soft Vengeance</title>, <title
        type="work">The Life and Times of Fanny Hill</title> (adapted from the <name type="org"
        >John Cleland novel</name>) and <title type="work">Ironmistress</title>. Her work for
    radio includes <title>The Outlander</title> (<name type="org">Radio 5</name>), which won the
        <name type="org">Writers’ Guild Award</name> (<date>1992</date>), and, for opera, <title
        type="work">Flight</title> with composer <name type="person">Jonathan Dove</name> (<name
        type="place">Glyndebourne</name>, <date>1998</date>).</p>
 </book>

Expected output:

<book>
<p>Isn't it lovely here? Very smart. We'll be like three <highlight>>queens</highlight> when
    you've finished with us, Edie. You doing well then?</p>
<p>
    <name type="person">April De Angelis</name>’ plays <highlight>include</highlight>
    <title type="work">Positive Hour</title> (Out of Joint) <title type="work">Playhouse
        Creatures</title> (<name type="org">Sphinx Theatre Company</name>), <title type="work"
        >Hush</title> (<name type="org">Royal Court</name>), <title type="work">Soft
        Vengeance</title>, <title type="work">The Life and Times of Fanny Hill</title> (adapted
    from the <name type="org">John Cleland novel</name>) and <title type="work"
        >Ironmistress</title>. Her work for radio includes <title>The Outlander</title> (<name
        type="org">Radio 5</name>), which won the <name type="org">Writers’ Guild Award</name>
        (<date>1992</date>), and, for opera, <title type="work">Flight</title> with composer
        <name type="person">Jonathan Dove</name> (<name type="place">Glyndebourne</name>,
        <date>1998</date>).</p>
</book>

I am using BaseX version 9.5.1 below is the code.

let $body := <indexedterms>
        <content>
            <terms>
                <term>include</term>
                <term>Queens</term>
            </terms>
            <uri>/IEEE/IEEE/test.xml</uri>
        </content>
     </indexedterms>

for $contents in $body/content
let $uri := $contents/uri
let $doc := fn:doc($uri)
for $selectedterm in $contents/terms/term/string()
let $Modifieddoc := copy $c := $doc
                    modify
                       (
                          for $nodes in $c//*//text()[fn:matches(.,$selectedterm)]/parent::*
                          return
                          if($nodes/node()[fn:matches(.,$selectedterm)]/parent::*:highlight)
                          then ()
                          else
                          replace node  $nodes/$selectedterm with <highlight>{$selectedterm}</highlight>
                       )
                   return $c
return                       
db:replace('IEEE',substring-after($uri,'/IEEE'),$Modifieddoc)                

Previously I was using the "replace node $nodes/node()[fn:contains(.,$selectedterm)] with {$selectedterm} " instead of "replace node $nodes/$selectedterm with {$selectedterm}" it was doing the work but where terms like steam e.g.(include, includes) so it was matching the both words which is not correct so I have changed the code to "replace node "$nodes/$selectedterm with {$selectedterm}"

Upvotes: 0

Views: 311

Answers (1)

Martin Honnen
Martin Honnen

Reputation: 167436

$nodes/$selectedterm is probably the culprit and most likely not what you want as the $selectedterm variable is a sequence of string values (you bind for $selectedterm in $contents/terms/term/string()). It might help us understand what you want to achieve if you show us a sample document you load with the doc function and the update you want to do on that with BaseX, for instance, for the two sample terms you have shown in your code snippet.

Your task of identifying and wrapping search terms in your text contents can be done nicely in XSLT 3 or 3 which you can run with BaseX if you put Saxon 9.9 or 10 or 11 on the class path:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="3.0"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:fn="http://www.w3.org/2005/xpath-functions"
  exclude-result-prefixes="#all"
  expand-text="yes">
  
  <xsl:param name="terms" as="xs:string*" select="'include', 'Queens'"/>

  <xsl:output method="xml" indent="no"/>
  
  <xsl:template match="p//text()">
    <xsl:apply-templates select="analyze-string(., string-join($terms, '|'), 'i')/node()"/>
  </xsl:template>
  
  <xsl:template match="fn:match">
    <highlight>{.}</highlight>
  </xsl:template>
  
  <xsl:template match="fn:non-match">
    <xsl:apply-templates/>
  </xsl:template>

  <xsl:mode on-no-match="shallow-copy"/>

</xsl:stylesheet>

As the used analyze-string function exists also in BaseX/XQuery you should also be able to use XQuery update on the result calling that function, i.e. by replacing fn:match elements with highlight elements.

Upvotes: 0

Related Questions