Reputation: 31
I have a tokenized variable that contains list of filenames from a .txt of a directory listing. I want to look for those filenames in a number of xml files in a number of subdirectories. If the filename is found, I want to output that "filename" was found in "xmlfile".
There are a lot of xml directories and they are not static. Same with xml files. The filenames are not tagged in the xml, so I'm just looking for their plain text occurence in the file.
Any help would be appreciated.
to make the examples easier - I want to use
$filenames_to_find (tokenized list of filenames from a .txt directory listing)
to search against
dir1/*.xml
dir2/*.xml
with the output being
filename was found in xmlfilename
I'm using an academic version of Oxygen XML so I think I have Saxon through that and I have the standalone Saxon file for running this from the command line.
Thanks to the answers so far and more google searches, I've gotten this, which doesn't work. I know it's broken, but I don't know how to fix it!
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:h="http://www.w3.org/1999/xhtml"
exclude-result-prefixes="xs"
version="3.0"
expand-text="yes"
>
<xsl:variable name="filenames_from_directory_listing" as="xs:string" select="unparsed-text('filenames_from_directory_listing.txt')"/>
<xsl:variable name="filenames_to_find" select="tokenize($filenames_from_directory_listing, '\s+')"/>
<xsl:template match="/">
<xsl:for-each select="collection('.?select=*.xml;recurse=yes')"/>
<xsl:variable name="xml_filenames" select="."/>
<xsl:for-each select="$filenames_to_find">
<xsl:if test="(contains($t, .))">
<xsl:message>{document-uri($xml_filenames)} contains {.}</xsl:message>
</xsl:if>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Any suggestions? Clearly I am an XSL novice. Thanks for your patience.
Upvotes: 0
Views: 48
Reputation: 163458
Assuming Saxon, or another product that maps collection URIs to directory filenames in a similar way, you can do
<xsl:for-each select="collection('.?select=*.xml;recurse=yes')">
<xsl:variable name="doc" select="."/>
<xsl:for-each select="$filenames">
<xsl:if test="some $t in $doc//text() satisfies(contains($t, .))">
<xsl:message>{document-uri($doc)} contains {.}</xsl:message>
Actually you could replace the xsl:if test by test="contains($doc, .)"
but it might be less efficient if the documents are large, since it involves assembling the whole string value of the document as a string in memory.
Another alternative would be to process the files as unparsed text files rather than XML files, but that would require some tinkering with the Saxon configuration so it doesn't automatically parse files with a '.xml' file extension as XML.
Upvotes: 1