sylar_80
sylar_80

Reputation: 375

extract text from xml using XSLT 1.0

I have the following xml file which is a part of a metadata (I extracted just one part of it)

<?xml version="1.0" encoding="UTF-8"?>
        <gmd:MD_Metadata xmlns:gmd="http://www.isotc211.org/2005/gmd" xmlns:gts="http://www.isotc211.org/2005/gts" xmlns:gco="http://www.isotc211.org/2005/gco" xmlns:gml="http://www.opengis.net/gml" xmlns:geonet="http://www.fao.org/geonetwork" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.isotc211.org/2005/gmd http://www.isotc211.org/2005/gmd/gmd.xsd http://www.isotc211.org/2005/srv http://schemas.opengis.net/iso/19139/20060504/srv/srv.xsd">
        <gmd:pointOfContact>
                    <gmd:CI_ResponsibleParty>
                       <gmd:individualName xsi:type="gmd:PT_FreeText_PropertyType">
                          <gco:CharacterString>Freddie Mercury</gco:CharacterString>
                          <gmd:PT_FreeText>
                             <gmd:textGroup>
                                <gmd:LocalisedCharacterString locale="#ITA">Pippo</gmd:LocalisedCharacterString>
                             </gmd:textGroup>
                          </gmd:PT_FreeText>
                       </gmd:individualName>
                       <gmd:organisationName xsi:type="gmd:PT_FreeText_PropertyType">
                          <gco:CharacterString>The Queen</gco:CharacterString>
                          <gmd:PT_FreeText>
                             <gmd:textGroup>
                                <gmd:LocalisedCharacterString locale="#ITA">Music Institute</gmd:LocalisedCharacterString>
                             </gmd:textGroup>                         
                          </gmd:PT_FreeText>
                       </gmd:organisationName>
                       <gmd:positionName xsi:type="gmd:PT_FreeText_PropertyType">
                          <gco:CharacterString>Singer</gco:CharacterString>                              
                       </gmd:positionName>
                       <gmd:contactInfo>
                          <gmd:CI_Contact>
                             <gmd:phone>
                                <gmd:CI_Telephone>
                                   <gmd:voice>
                                      <gco:CharacterString>123456789</gco:CharacterString>
                                   </gmd:voice>
                                   <gmd:facsimile>
                                      <gco:CharacterString>123456789</gco:CharacterString>
                                   </gmd:facsimile>
                                </gmd:CI_Telephone>
                             </gmd:phone>
                             <gmd:address>
                                <gmd:CI_Address>                                  
                                   <gmd:city>
                                      <gco:CharacterString>Zanzibar</gco:CharacterString>
                                   </gmd:city>                                   
                                   <gmd:postalCode>
                                      <gco:CharacterString>00001</gco:CharacterString>
                                   </gmd:postalCode>
                                   <gmd:country>
                                      <gco:CharacterString>India</gco:CharacterString>
                                   </gmd:country>
                                   <gmd:electronicMailAddress>
                                      <gco:CharacterString>[email protected]</gco:CharacterString>
                                   </gmd:electronicMailAddress>
                                </gmd:CI_Address>
                             </gmd:address>
                             <gmd:transferOptions>
                <gmd:MD_DigitalTransferOptions>
                   <gmd:onLine>
                      <gmd:CI_OnlineResource>
                         <gmd:linkage>
                            <gmd:URL>http://www.google.it</gmd:URL>
                         </gmd:linkage>  
                      </gmd:CI_OnlineResource>
                   </gmd:onLine>               
                </gmd:MD_DigitalTransferOptions>
             </gmd:transferOptions>
                         </gmd:CI_Contact>
                       </gmd:contactInfo>
                       <gmd:role>
                          <gmd:CI_RoleCode codeListValue="pointOfContact" codeList="http://standards.iso.org/ittf/PubliclyAvailableStandards/ISO_19139_Schemas/resources/Codelist/ML_gmxCodelists.xml#CI_RoleCode"/>
                       </gmd:role>
        </gmd:CI_ResponsibleParty>
        </gmd:pointOfContact>
    </gmd:MD_Metadata>

I would like to extract only two information from this file: 1. the text Freddie Mercury (gco:CharacterString)

  1. the url http://www.google.it (gmd:URL)

I started trying using the following XSLT transformation

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="text" encoding="UTF-8" />

<gmd:MD_Metadata xmlns:gco="http://www.isotc211.org/2005/gco" xmlns:gmd="http://www.isotc211.org/2005/gmd" gco:isoType="gmd:MD_Metadata"> 

    <xsl:template match="//gmd:pointOfContact">
        <xsl:apply-templates select="gco:CharacterString" />
    </xsl:template>

    <xsl:template match="gco:CharacterString">
        <xsl:text>name=</xsl:text>          
    </xsl:template>
</gmd:MD_Metadata>
</xsl:stylesheet>

but it is not working.

Could you please help me in this goal?

Upvotes: 0

Views: 358

Answers (1)

badperson
badperson

Reputation: 1624

I don't know if you want to only output the text or create a new xml doc, but this stylesheet picks up the elements you want, I think:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0"
xmlns:gmd="http://www.isotc211.org/2005/gmd" 
xmlns:gts="http://www.isotc211.org/2005/gts" 
xmlns:gco="http://www.isotc211.org/2005/gco" 
xmlns:gml="http://www.opengis.net/gml" 
xmlns:geonet="http://www.fao.org/geonetwork">

<!-- create new root element -->
<xsl:template match="/">
    <root>
        <xsl:apply-templates/>
    </root>
</xsl:template>

<!-- identity templates walks tree and suppresses nodes with no template -->
<xsl:template match="node()|@*">
        <xsl:apply-templates select="node()|@*"/>    
</xsl:template>

<!-- output only on nodes we select -->
<xsl:template match="node()|@*" mode="output">
    <xsl:copy>
        <xsl:apply-templates select="node()|@*" mode="output"/>
    </xsl:copy>
</xsl:template>    

<!-- match our two nodes and wrap in match tag. -->
<xsl:template match="gco:CharacterString[ancestor::gmd:individualName] | gmd:URL">
    <match>
        <xsl:apply-templates mode="output"/>
    </match>
</xsl:template>

</xsl:stylesheet>

this creates the output

<?xml version="1.0" encoding="utf-8"?>
<root xmlns:gmd="http://www.isotc211.org/2005/gmd"     xmlns:gts="http://www.isotc211.org/2005/gts"
xmlns:gco="http://www.isotc211.org/2005/gco"     xmlns:gml="http://www.opengis.net/gml"
xmlns:geonet="http://www.fao.org/geonetwork">
<match>Freddie Mercury</match>
<match>http://www.google.it</match>
</root>

If you just wanted a text output you could specify that in your xsl:method. Also your description said you only wanted to output Freddy Mercury; the gmd:individualName was unique in this case, but not sure what kind of tagging variations there are on the set of files you would want to use this for.

This file only contained one gmd:URL tag, again not sure what kind of variation might exist, but this gets the output as per your question

Upvotes: 1

Related Questions