ZioByte
ZioByte

Reputation: 2994

Filtering using xslt for specific node values

I need to filter huge and redundant xml file. Easy thing is to eliminate all nodes with no attributes and no content:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="@*|node()">
    <xsl:if test=". != '' or ./@* != ''">
      <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
      </xsl:copy>
    </xsl:if>
  </xsl:template>
</xsl:stylesheet>

but i also need to filter out nodes containing

<type>0</type>

nodes containing only

<whatever id="-1 />

and nodes containing only empty attributes like:

  <dateacquired year="" month="" day="" long="" unformatted=""/>

an excerpt of my (machine generated) input file is:

<record table="book" id="1">
<bookdata>
  <bookid unformatted="1">1</bookid>
  <marked bool="False">No</marked>
  <lastmodified year="2013" month="09" day="25" long="Wednesday, September 25, 2013" unformatted="20130925">09/25/2013</lastmodified>
  <title>Intervista Col Vampiro</title>
  <fulltitle>Ciclo Dei Vampiri: Intervista Col Vampiro</fulltitle>
  <fulltitle2>Intervista Col Vampiro (Ciclo Dei Vampiri)</fulltitle2>
  <referenceno>BB00001</referenceno>
  <publishdate year="1993" month="" day="" long="1993" unformatted="1993">1993</publish date>
  <copyrightdate year="" month="" day="" long="" unformatted=""/>
  <type id="-1"/>
  <authors sort="Rice, Anne">
    <author id="1">
      <name>Anne Rice</name>
      <sortby>Rice, Anne</sortby>
      <roles/>
    </author>
  </authors>
  <credits/>
  <image1>
    <filename>Book_1_3.jpg</filename>
    <type>2</type>
    <notes/>
  </image1>
  <image2>
    <filename/>
    <type>0</type>
    <notes/>
  </image2>
  <image3>
    <filename/>
    <type>0</type>
    <notes/>
  </image3>
  <image4>
    <filename/>
    <type>0</type>
    <notes/>
  </image4>
  <image5>
    <filename/>
    <type>0</type>
    <notes/>
  </image5>
  <image6>
    <filename/>
    <type>0</type>
    <notes/>
  </image6>
  <image7>
    <filename/>
    <type>0</type>
    <notes/>
  </image7>
  <image8>
    <filename/>
    <type>0</type>
    <notes/>
  </image8>
  <image9>
    <filename/>
    <type>0</type>
    <notes/>
  </image9>
  <subtitle/>
  <titlesort>Intervista Col Vampiro</titlesort>
  <publisher id="1">Salani</publisher>
  <publicationplace id="-1"/>
  <isbn/>
  <lccn/>
  <lccallnum/>
  <dewey>823.9</dewey>
  <country id="-1"/>
  <pages unformatted="283">283</pages>
  <numberofsections unformatted="0">0</numberofsections>
  <printedby id="-1"/>
  <binding id="-1"/>
  <edition id="1">Ebook</edition>
  <printing id="-1"/>
  <language id="-1"/>
  <series id="1">Ciclo Dei Vampiri</series>
  <releaseno unformatted="0">0</releaseno>
  <originaltitle>Interview With The Vampire</originaltitle>
  <originalsubtitle/>
  <originalpublisher id="-1"/>
  <originalcountry id="-1"/>
  <originallanguage id="-1"/>
  <originalcopyright year="1976" month="" day="" long="1976" unformatted="1976">1976</originalcopyright>
  <price integer="8" fraction="0" unformatted="8.0">8.00</price>
  <value integer="0" fraction="0" unformatted="0.0">0.00</value>
  <sellingprice integer="0" fraction="0" unformatted="0.0">0.00</sellingprice>
  <changeinvalue>0.00</changeinvalue>
  <changeinvaluepr>0.00</changeinvaluepr>
  <condition id="-1"/>
  <appraiser id="-1"/>
  <insurance id="-1"/>
  <registered year="2005" month="09" day="10" long="Saturday, September 10, 2005" unformatted="20050910">09/10/2005</registered>
  <status id="-1"/>
  <dateacquired year="" month="" day="" long="" unformatted=""/>
  <acquiredfrom id="-1"/>
  <personalrating id="-1"/>
  <category id="1">Horror-Gotico</category>
  <subcategory id="-1"/>
  <owner id="-1"/>
  <location id="-1"/>
  <keywords>
    <keyword id="1">Vampiro</keyword>
    <keyword id="2">Vampiri</keyword>
  </keywords>
  <newbook bool="False">No</newbook>
  <onloan bool="False">No</onloan>
  <overdue bool="False">No</overdue>
  <borrower id="-1"/>
  <borrowercategory id="-1"/>
  <dateborrowed year="" month="" day="" long="" unformatted=""/>
  <datedue year="" month="" day="" long="" unformatted=""/>
  <reserved bool="False">No</reserved>
  <reservedto id="-1"/>
  <reserveddate year="" month="" day="" long="" unformatted=""/>
  <awards/>
  <awardyear/>
  <awarddetails/>
  <nominations/>
  <nominationyear/>
  <nominationdetails/>
  <custom01/>
  <custom02/>
  <custom03>http://www.ddunlimited.net/viewtopic.php?f=1079&amp;t=3749847</custom03>
  <custom04/>
  <custom05 id="-1"/>
  <custom06 id="-1"/>
  <custom07 id="-1"/>
  <custom08 id="-1"/>
  <custom09 year="" month="" day="" long="" unformatted=""/>
  <custom10 integer="0" fraction="0" unformatted="0.0">0.00</custom10>
  <custom11 bool="True">Yes</custom11>
  <custom12 bool="False">No</custom12>
  <custom13 bool="False">No</custom13>
  <custom14 bool="True">Yes</custom14>
  <custom15 bool="False">No</custom15>
  <custom16 bool="False">No</custom16>
  <custom17 bool="False">No</custom17>
  <custom18 bool="False">No</custom18>
  <notes>ed2k://|file|eBook.ITA.001.Anne.Rice.Intervista.Col.Vampiro.(doc.lit.pdf.rtf).[Hyps].rar|1998285|81D4C283C03E5787170A33C335577533|/</notes>
  <synopsis>A San Francisco alle soglie del 2000 il giornalista Mallory viene avvicinato da Louis De Point Du Lac, vampiro dal 1791, quando era un proprietario terriero presso New Orleans. Ridotto alla disperazione per la perdita della moglie e della figlioletta vieneiniziato alla sua tenebrosa e ferina esistenza da Lestat, collega di origini parigine, che cerca invano di far superare al discepolo l&apos;innata repulsione per l&apos;omicidio. Invano Louis si ciba di sangue di ratti e galline, e fà fuggire i servi incendiando la casa. Ormai Lestat lo domina e lo coinvolge in efferate uccisioni di innocenti. Una bimba orfana, Claudia, viene &quot;adottata&quot; dai due e si rivela feroce quant&apos;altri mai.</synopsis>
  <reviews/>
  <weblinks/>
  <weblinktype id="1"/>
  <filelinks/>
  <filelinktype id="1"/>
  <barcode/>
  <originalseries id="-1"/>
  <originalreleaseno unformatted="0">0</originalreleaseno>
  <readhistory/>
  <lastread year="" month="" day="" long="" unformatted=""/>
  <readcount unformatted="0">0</readcount>
  <dustjacketcondition id="-1"/>
  <dimensions_width integer="0" fraction="0" unformatted="0.0">0.00</dimensions_width>
  <dimensions_height integer="0" fraction="0" unformatted="0.0">0.00</dimensions_height>
  <dimensions_depth integer="0" fraction="0" unformatted="0.0">0.00</dimensions_depth>
  <coverprice integer="0" fraction="0" unformatted="0.0">0.00</coverprice>
  <coverprice_currency id="-1"/>
  <booklinks/>
</bookdata>
<contentsdata items="0"/>
</record>

desired output would be:

<record table="book" id="1">
<bookdata>
  <bookid unformatted="1">1</bookid>
  <marked bool="False">No</marked>
  <lastmodified year="2013" month="09" day="25" long="Wednesday, September 25, 2013" unformatted="20130925">09/25/2013</lastmodified>
  <title>Intervista Col Vampiro</title>
  <fulltitle>Ciclo Dei Vampiri: Intervista Col Vampiro</fulltitle>
  <fulltitle2>Intervista Col Vampiro (Ciclo Dei Vampiri)</fulltitle2>
  <referenceno>BB00001</referenceno>
  <publishdate year="1993" month="" day="" long="1993" unformatted="1993">1993</publish date>
  <authors sort="Rice, Anne">
    <author id="1">
      <name>Anne Rice</name>
      <sortby>Rice, Anne</sortby>
    </author>
  </authors>
  <image1>
    <filename>Book_1_3.jpg</filename>
    <type>2</type>
  </image1>
  <titlesort>Intervista Col Vampiro</titlesort>
  <publisher id="1">Salani</publisher>
  <dewey>823.9</dewey>
  <pages unformatted="283">283</pages>
  <numberofsections unformatted="0">0</numberofsections>
  <edition id="1">Ebook</edition>
  <series id="1">Ciclo Dei Vampiri</series>
  <releaseno unformatted="0">0</releaseno>
  <originaltitle>Interview With The Vampire</originaltitle>
  <originalcopyright year="1976" month="" day="" long="1976" unformatted="1976">1976</originalcopyright>
  <price integer="8" fraction="0" unformatted="8.0">8.00</price>
  <value integer="0" fraction="0" unformatted="0.0">0.00</value>
  <sellingprice integer="0" fraction="0" unformatted="0.0">0.00</sellingprice>
  <changeinvalue>0.00</changeinvalue>
  <changeinvaluepr>0.00</changeinvaluepr>
  <registered year="2005" month="09" day="10" long="Saturday, September 10, 2005" unformatted="20050910">09/10/2005</registered>
  <category id="1">Horror-Gotico</category>
  <keywords>
    <keyword id="1">Vampiro</keyword>
    <keyword id="2">Vampiri</keyword>
  </keywords>
  <newbook bool="False">No</newbook>
  <onloan bool="False">No</onloan>
  <overdue bool="False">No</overdue>
  <reserved bool="False">No</reserved>
  <custom03>http://www.ddunlimited.net/viewtopic.php?f=1079&amp;t=3749847</custom03>
  <custom10 integer="0" fraction="0" unformatted="0.0">0.00</custom10>
  <custom11 bool="True">Yes</custom11>
  <custom12 bool="False">No</custom12>
  <custom13 bool="False">No</custom13>
  <custom14 bool="True">Yes</custom14>
  <custom15 bool="False">No</custom15>
  <custom16 bool="False">No</custom16>
  <custom17 bool="False">No</custom17>
  <custom18 bool="False">No</custom18>
  <notes>ed2k://|file|eBook.ITA.001.Anne.Rice.Intervista.Col.Vampiro.(doc.lit.pdf.rtf).[Hyps].rar|1998285|81D4C283C03E5787170A33C335577533|/</notes>
  <synopsis>A San Francisco alle soglie del 2000 il giornalista Mallory viene avvicinato da Louis De Point Du Lac, vampiro dal 1791, quando era un proprietario terriero presso New Orleans. Ridotto alla disperazione per la perdita della moglie e della figlioletta vieneiniziato alla sua tenebrosa e ferina esistenza da Lestat, collega di origini parigine, che cerca invano di far superare al discepolo l&apos;innata repulsione per l&apos;omicidio. Invano Louis si ciba di sangue di ratti e galline, e fà fuggire i servi incendiando la casa. Ormai Lestat lo domina e lo coinvolge in efferate uccisioni di innocenti. Una bimba orfana, Claudia, viene &quot;adottata&quot; dai due e si rivela feroce quant&apos;altri mai.</synopsis>
  <weblinktype id="1"/>
  <filelinktype id="1"/>
  <originalreleaseno unformatted="0">0</originalreleaseno>
  <readcount unformatted="0">0</readcount>
  <dimensions_width integer="0" fraction="0" unformatted="0.0">0.00</dimensions_width>
  <dimensions_height integer="0" fraction="0" unformatted="0.0">0.00</dimensions_height>
  <dimensions_depth integer="0" fraction="0" unformatted="0.0">0.00</dimensions_depth>
  <coverprice integer="0" fraction="0" unformatted="0.0">0.00</coverprice>
</bookdata>
<contentsdata items="0"/>
</record>

Problem is I do not really grok transformations and, while I tried to read about them, I didn't find a comprehensible tutorial. Any pointer welcome!

As an additional bonus I would also like to filter out specific "null" items like the above dimensions_*.

TiA

Upvotes: 2

Views: 2087

Answers (1)

Tomalak
Tomalak

Reputation: 338148

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="*[normalize-space(.) = 0]" />
  <xsl:template match="*[normalize-space(.) = '' and count(@*[. = '']) = count(@*)]" />
  <!-- write more empty templates for nodes that should be removed -->

</xsl:stylesheet>

Note that count(@*[. = '']) = count(@*) could be written as not(@*[. != '']) if you fancy that.

Upvotes: 1

Related Questions