Reputation: 668
I have a python script, that uses lxml to change the values of specific tags. I have the following xml
<gmd:CI_Citation>
<gmd:date>
<gmd:CI_Date>
<gmd:date>
<gco:Date>**1900-01-01**</gco:Date>
</gmd:date>
<gmd:dateType>
<gmd:CI_DateTypeCode codeList="http://standards.iso.org/ittf/PubliclyAvailableStandards/ISO_19139_Schemas/resources/Codelist/gmxCodelists.xml#CI_DateTypeCode" codeListValue="publication">Publication</gmd:CI_DateTypeCode>
</gmd:dateType>
</gmd:CI_Date>
</gmd:date>
<gmd:date>
<gmd:CI_Date>
<gmd:date>
<gco:Date>**1900-01-01**</gco:Date>
</gmd:date>
<gmd:dateType>
<gmd:CI_DateTypeCode codeList="http://standards.iso.org/ittf/PubliclyAvailableStandards/ISO_19139_Schemas/resources/Codelist/gmxCodelists.xml#CI_DateTypeCode" codeListValue="creation">Creation</gmd:CI_DateTypeCode>
</gmd:dateType>
</gmd:CI_Date>
</gmd:date>
<gmd:date>
<gmd:CI_Date>
<gmd:date>
<gco:Date>**1900-01-01**</gco:Date>
</gmd:date>
<gmd:dateType>
<gmd:CI_DateTypeCode codeList="http://standards.iso.org/ittf/PubliclyAvailableStandards/ISO_19139_Schemas/resources/Codelist/gmxCodelists.xml#CI_DateTypeCode" codeListValue="revision">Revision</gmd:CI_DateTypeCode>
</gmd:dateType>
</gmd:CI_Date>
</gmd:date>
</gmd:CI_Citation>
For each different date type (Publication, Creation and Revision) I want to change the date to a specific date, however the tags for all 3 are the same -
//:gmd_citation/:gmd_CI:Citation/:gmd_date/:gmd_CI_Date/:gmd_date/:gco_Date
I am using the following function to change the values
def updateXMLTag (tag, value):
xmlValue = root.xpath(tag)
xmlValue[0].text = str(value)
What is the best way using xpath to get to the specific tag, so that the value can be changed?
Upvotes: 0
Views: 102
Reputation: 18762
This is my way of using xpath to get to the specific elements, and edit them:
# Find the best implementation available on the platform
try:
from cStringIO import StringIO
except:
from StringIO import StringIO
from lxml import etree
# proper namespaces added to get valid xml
xmlstr = StringIO("""<gmd:CI_Citation xmlns:gmd="http://gmd.example.com" xmlns:gco="http://gco.example.com">
<gmd:date>
<gmd:CI_Date>
<gmd:date>
<gco:Date>1900-01-01</gco:Date>
</gmd:date>
<gmd:dateType>
<gmd:CI_DateTypeCode codeList="http://standards.iso.org/ittf/PubliclyAvailableStandards/ISO_19139_Schemas/resources/Codelist/gmxCodelists.xml#CI_DateTypeCode" codeListValue="publication">Publication</gmd:CI_DateTypeCode>
</gmd:dateType>
</gmd:CI_Date>
</gmd:date>
<gmd:date>
<gmd:CI_Date>
<gmd:date>
<gco:Date>1900-01-01</gco:Date>
</gmd:date>
<gmd:dateType>
<gmd:CI_DateTypeCode codeList="http://standards.iso.org/ittf/PubliclyAvailableStandards/ISO_19139_Schemas/resources/Codelist/gmxCodelists.xml#CI_DateTypeCode" codeListValue="creation">Creation</gmd:CI_DateTypeCode>
</gmd:dateType>
</gmd:CI_Date>
</gmd:date>
<gmd:date>
<gmd:CI_Date>
<gmd:date>
<gco:Date>1900-01-01</gco:Date>
</gmd:date>
<gmd:dateType>
<gmd:CI_DateTypeCode codeList="http://standards.iso.org/ittf/PubliclyAvailableStandards/ISO_19139_Schemas/resources/Codelist/gmxCodelists.xml#CI_DateTypeCode" codeListValue="revision">Revision</gmd:CI_DateTypeCode>
</gmd:dateType>
</gmd:CI_Date>
</gmd:date>
</gmd:CI_Citation>""")
tree = etree.parse(xmlstr)
Here we use xpath to get all the (3) target elements.
targets = tree.xpath('/gmd:CI_Citation/gmd:date/gmd:CI_Date/gmd:dateType/gmd:CI_DateTypeCode', \
namespaces={'gmd': "http://gmd.example.com", 'gco': "http://gco.example.com"})
The three elements are distinguished by unique attribute values,
which can be checked with a simple function hasattr
def hasattr(elem, att, val):
try:
return elem.attrib[att] == val
except:
return False
targets[0] codeListValue/ text node: "publication"/ "Publication"
targets[1] codeListValue/ text node: "creation"/ "Creation"
targets[2] codeListValue/ text node: "revision"/ "Revision"
Which one needs changes?
hasattr(targets[0], 'codeListValue', 'publication') # True
hasattr(targets[1], 'codeListValue', 'creation') # True
hasattr(targets[2], 'codeListValue', 'publication') # False
# Let's change one of them
t1 = targets[1]
t1.text = 'New Creation' # change text node
# and/or change attribute
t1.attrib['codeListValue'] = 'Latest Creation'
Finally, we save the result to a file
tree.write("output1.xml")
Edit 1
Here we navigate to cousin1 (gco:Date) of the already found target[1] that needs change:
t1 = targets[1]
parent1 = t1.getparent()
date1 = parent1.getprevious()
cousin1 = date1.getchildren()
len(cousin1) #1
cousin1[0].text #'1900-01-01'
# change the date
cousin1[0].text = '2017-5-3'
# again, write the result
tree.write("out456.xml")
Upvotes: 1