Reputation: 873
I have some XML I need to transform using XML. When I created my XSLT the data was in one format, but then the format got changed on me so I need to change my XSLT accordingly.
The XSLT is supposed to create a raw text tag, and then strip out the metadata in the sentence <S>
tags and append them to variable names (i.e. <ENAMEX type="PERSON"...
becomes ENAMEX_PERSON
). Before the whole xml was <DOC> ... </DOC>
but now it's <NORMDOC> <DOC> ... </DOC> ... </NORMDOC>
so I repaired that in my selection pattern but now it's stripped out all the tags before <TXT>
where it didn't before when my selection pattern was just DOC/
. How do I change my XSLT to have it only do this stripping in TXT
?
Input
<NORMDOC>
<DOC>
<DOCID>123</DOCID>
<FI fitype="B" xref="12345">
<FIName>BA</FIName>
<FITIN>456</FITIN>
</FI>
<OIs>
<OI xref="54321">
<OIName>BA</OIName>
</OI>
</OIs>
<Subjects>
<Subject stype="PER" xref="111111">
<SubjectFullName type="L">DISNEY/WALT</SubjectFullName>
<SubjectLastName type="L">DISNEY</SubjectLastName>
<SubjectFirstName type="L">WALT</SubjectFirstName>
<SubjectPhone type="Work">1234567890</SubjectPhone>
<SubjectPhone type="Residence">9876543210</SubjectPhone>
</Subject>
</Subjects>
<TXT>
<S sid="123-SENT-001">INTRODUCTION this is being filed to report suspicious activity between customer<WH/>'<WH/>s personal account and his animation business.</S> <S sid="123-SENT-002">The following suspect was identified: <ENAMEX type="PERSON" id="PER-123-000">WALT DISNEY</ENAMEX>.</S> <S sid="123-SENT-003">The reportable amount is <NUMEX type="MONEY" id="MON-123-001">$123,456</NUMEX>.</S> <S sid="123-SENT-004">The suspicious activity took place between <TIMEX type="DATE" id="DAT-123-002">06/01/1923</TIMEX> and <TIMEX type="DATE" id="DAT-123-003">12/15/1966</TIMEX> at studios in <LOCEX type="LOCATION" id="LOC-123-004">Los Angeles</LOCEX>, <LOCEX type="STATE" id="STA-123-005">CA</LOCEX> (<ENAMEX type="BRANCH" id="BRA-123-006">Sixth & Central</ENAMEX>; <LOCEX type="LOCATION" id="LOC-123-007">Wilshire</LOCEX>-<LOCEX type="LOCATION" id="LOC-123-008">La Brea</LOCEX>; <ENAMEX type="ORGANIZATION" id="ORG-123-009">La Brea-Rosewood</ENAMEX>; Melrose-Fairfax) and theatres in <LOCEX type="LOCATION" id="LOC-123-010">Los Angeles</LOCEX>, CA.</S>
</TXT>
</DOC>
<ENTINFO ID="ACC-123-081" TYPE="ACCOUNT" NORM="222222222" REFID="ACC-123-081" ACCT-TYPE="CHK" MENTION="account: animation studio checking account 222222222" />
</NORMDOC>
XSLT
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" indent="yes" />
<xsl:template match="/">
<DOC>
<xsl:apply-templates select="NORMDOC/DOC/*" />
<xsl:apply-templates select="NORMDOC/DOC/TXT" mode="extra"/>
</DOC>
</xsl:template>
<xsl:template match="*">
<xsl:copy>
<xsl:value-of select="current()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="TXT">
<RAW_TXT>
<xsl:value-of select="current()"/>
</RAW_TXT>
</xsl:template>
<xsl:template match="TXT" mode="extra">
<TXT>
<xsl:for-each select="*">
<xsl:element name="{local-name()}">
<xsl:for-each select="*">
<xsl:variable name="type" select="@type"/>
<xsl:element name="{concat(name(), '_', $type)}">
<xsl:value-of select="current()"/>
</xsl:element>
</xsl:for-each>
</xsl:element>
</xsl:for-each>
</TXT>
</xsl:template>
</xsl:stylesheet>
Actual Output
<DOC>
<DOCID>123</DOCID>
<FI>
BA
456
</FI>
<OIs>
BA
</OIs>
<Subjects>
DISNEY/WALT
DISNEY
WALT
1234567890
9876543210
</Subjects>
<RAW_TXT>
INTRODUCTION this is being filed to report suspicious activity between customer's personal account and his animation business. The following suspect was identified: WALT DISNEY. The reportable amount is $123,456. The suspicious activity took place between 06/01/1923 and 12/15/1966 at studios in Los Angeles, CA (Sixth & Central; Wilshire-La Brea; La Brea-Rosewood; Melrose-Fairfax) and theatres in Los Angeles, CA.
</RAW_TXT>
<TXT>
<S>
<WH_/>
<WH_/>
</S>
<S>
<ENAMEX_PERSON>WALT DISNEY</ENAMEX_PERSON>
</S>
<S>
<NUMEX_MONEY>$123,456</NUMEX_MONEY>
</S>
<S>
<TIMEX_DATE>06/01/1923</TIMEX_DATE>
<TIMEX_DATE>12/15/1966</TIMEX_DATE>
<LOCEX_LOCATION>Los Angeles</LOCEX_LOCATION>
<LOCEX_STATE>CA</LOCEX_STATE>
<ENAMEX_BRANCH>Sixth & Central</ENAMEX_BRANCH>
<LOCEX_LOCATION>Wilshire</LOCEX_LOCATION>
<LOCEX_LOCATION>La Brea</LOCEX_LOCATION>
<ENAMEX_ORGANIZATION>La Brea-Rosewood</ENAMEX_ORGANIZATION>
<LOCEX_LOCATION>Los Angeles</LOCEX_LOCATION>
</S>
</TXT>
</DOC>
Expected Output
<DOC>
<DOCID>123</DOCID>
<FI>
<FINAME>BA</FINAME><FITIN>456</FITIN>
</FI>
<OIs>
<OINAME>BA</OINAME>
</OIs>
<Subjects>
<SubjectFullName>DISNEY/WALT</SubjectFullName>
<SubjectLastName>DISNEY</SubjectLastName>
<SubjectFirstName>WALT</SubjectFirstName>
<SubjectPhone_Work>1234567890</SubjectPhone_Work>
<SubjectPhone_Residence>9876543210</SubjectPhone_Residence>
</Subjects>
<RAW_TXT>
INTRODUCTION this is being filed to report suspicious activity between customer's personal account and his animation business. The following suspect was identified: WALT DISNEY. The reportable amount is $123,456. The suspicious activity took place between 06/01/1923 and 12/15/1966 at studios in Los Angeles, CA (Sixth & Central; Wilshire-La Brea; La Brea-Rosewood; Melrose-Fairfax) and theatres in Los Angeles, CA.
</RAW_TXT>
<TXT>
<S>
<WH_/>
<WH_/>
</S>
<S>
<ENAMEX_PERSON>WALT DISNEY</ENAMEX_PERSON>
</S>
<S>
<NUMEX_MONEY>$123,456</NUMEX_MONEY>
</S>
<S>
<TIMEX_DATE>06/01/1923</TIMEX_DATE>
<TIMEX_DATE>12/15/1966</TIMEX_DATE>
<LOCEX_LOCATION>Los Angeles</LOCEX_LOCATION>
<LOCEX_STATE>CA</LOCEX_STATE>
<ENAMEX_BRANCH>Sixth & Central</ENAMEX_BRANCH>
<LOCEX_LOCATION>Wilshire</LOCEX_LOCATION>
<LOCEX_LOCATION>La Brea</LOCEX_LOCATION>
<ENAMEX_ORGANIZATION>La Brea-Rosewood</ENAMEX_ORGANIZATION>
<LOCEX_LOCATION>Los Angeles</LOCEX_LOCATION>
</S>
</TXT>
</DOC>
Upvotes: 0
Views: 95
Reputation: 117073
AFAICT, the following stylesheet returns the expected result:
XSLT 1.0
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/NORMDOC">
<xsl:apply-templates select="DOC"/>
</xsl:template>
<xsl:template match="*">
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="TXT">
<RAW_TXT>
<xsl:value-of select="."/>
</RAW_TXT>
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="S">
<xsl:copy>
<xsl:apply-templates select="*" mode="extra"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*" mode="extra">
<xsl:element name="{name()}_{@type}">
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
Upvotes: 1
Reputation: 1882
Overrriding the identity rule is the best approach for your problem. This stylesheet:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="node()|@*" name="identity">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="NORMDOC">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="TXT">
<RAW_TXT>
<xsl:value-of select="."/>
</RAW_TXT>
<xsl:call-template name="identity"/>
</xsl:template>
<xsl:template match="TXT/S/text()|ENTINFO"/>
</xsl:stylesheet>
Output:
<DOC>
<DOCID>123</DOCID>
<FI fitype="B" xref="12345">
<FIName>BA</FIName>
<FITIN>456</FITIN>
</FI>
<OIs>
<OI xref="54321">
<OIName>BA</OIName>
</OI>
</OIs>
<Subjects>
<Subject stype="PER" xref="111111">
<SubjectFullName type="L">DISNEY/WALT</SubjectFullName>
<SubjectLastName type="L">DISNEY</SubjectLastName>
<SubjectFirstName type="L">WALT</SubjectFirstName>
<SubjectPhone type="Work">1234567890</SubjectPhone>
<SubjectPhone type="Residence">9876543210</SubjectPhone>
</Subject>
</Subjects>
<RAW_TXT>INTRODUCTION this is being filed to report suspicious activity between customer's personal account and his animation business.The following suspect was identified: WALT DISNEY.The reportable amount is $123,456.The suspicious activity took place between 06/01/1923 and 12/15/1966 at studios in Los Angeles, CA (Sixth & Central; Wilshire-La Brea; La Brea-Rosewood; Melrose-Fairfax) and theatres in Los Angeles, CA.</RAW_TXT>
<TXT>
<S sid="123-SENT-001">
<WH/>
<WH/>
</S>
<S sid="123-SENT-002">
<ENAMEX type="PERSON" id="PER-123-000">WALT DISNEY</ENAMEX>
</S>
<S sid="123-SENT-003">
<NUMEX type="MONEY" id="MON-123-001">$123,456</NUMEX>
</S>
<S sid="123-SENT-004">
<TIMEX type="DATE" id="DAT-123-002">06/01/1923</TIMEX>
<TIMEX type="DATE" id="DAT-123-003">12/15/1966</TIMEX>
<LOCEX type="LOCATION" id="LOC-123-004">Los Angeles</LOCEX>
<LOCEX type="STATE" id="STA-123-005">CA</LOCEX>
<ENAMEX type="BRANCH" id="BRA-123-006">Sixth & Central</ENAMEX>
<LOCEX type="LOCATION" id="LOC-123-007">Wilshire</LOCEX>
<LOCEX type="LOCATION" id="LOC-123-008">La Brea</LOCEX>
<ENAMEX type="ORGANIZATION" id="ORG-123-009">La Brea-Rosewood</ENAMEX>
<LOCEX type="LOCATION" id="LOC-123-010">Los Angeles</LOCEX>
</S>
</TXT>
</DOC>
Do note: the use of a "bypass rule" for NORMDOC
element; the use of empty rule for stripping S
' text nodes childs and ENTINFO
element and descendants; the use of named templates to be able to override the identity rule for TXT
element but not loosing the chance of it reuse.
Upvotes: 1