manoj1123
manoj1123

Reputation: 125

Editing the XML texts from a XML file using Python

I have an XML file which contains some data as given.

<?xml version="1.0" encoding="UTF-8" ?> 
- <ParameterData>
  <CreationInfo date="10/28/2009 03:05:14 PM" user="manoj" /> 
- <ParameterList count="85">
- <Parameter name="Spec 2 Included" type="boolean" mode="both">
  <Value>n/a</Value> 
  <Result>n/a</Result> 
  </Parameter>
- <Parameter name="Spec 2 Label" type="string" mode="both">
  <Value>n/a</Value> 
  <Result>n/a</Result> 
  </Parameter>
- <Parameter name="Spec 3 Included" type="boolean" mode="both">
  <Value>n/a</Value> 
  <Result>n/a</Result> 
  </Parameter>
- <Parameter name="Spec 3 Label" type="string" mode="both">
  <Value>n/a</Value> 
  <Result>n/a</Result> 
  </Parameter>
  </ParameterList>
  </ParameterData>

I have one text file with lines as

Spec 2 Included : TRUE
Spec 2 Label: 19-Flat2-HS3   
Spec 3 Included : FALSE
Spec 3 Label: 4-1-Bead1-HS3

Now I want to edit XML texts; i,e. I want to replace the field (n/a) with the corresponding values from the text file. Like I want the file to looks like

<?xml version="1.0" encoding="UTF-8" ?> 
- <ParameterData>
  <CreationInfo date="10/28/2009 03:05:14 PM" user="manoj" /> 
- <ParameterList count="85">
- <Parameter name="Spec 2 Included" type="boolean" mode="both">
  <Value>TRUE</Value> 
  <Result>TRUE</Result> 
  </Parameter>
- <Parameter name="Spec 2 Label" type="string" mode="both">
  <Value>19-Flat2-HS3</Value> 
  <Result>19-Flat2-HS3</Result> 
  </Parameter>
- <Parameter name="Spec 3 Included" type="boolean" mode="both">
  <Value>FALSE</Value> 
  <Result>FALSE</Result> 
  </Parameter>
- <Parameter name="Spec 3 Label" type="string" mode="both">
  <Value>4-1-Bead1-HS3</Value> 
  <Result>4-1-Bead1-HS3</Result> 
  </Parameter>
  </ParameterList>
  </ParameterData>

I am new to this Python-XML coding. I dont have idea about how to edit the text fields in a XML file. I am trying to Use elementtree.ElementTree module. but to read the lines in XML file and extract the attributes I dont know which modules need to be imported.

Please help.

Thanks and Regards.

Upvotes: 6

Views: 13225

Answers (4)

YOU
YOU

Reputation: 123831

You can convert your data text into python dictionary by regular expression

data="""Spec 2 Included : TRUE
Spec 2 Label: 19-Flat2-HS3
Spec 3 Included : FALSE
Spec 3 Label: 4-1-Bead1-HS3"""

#data=open("data.txt").read()

import re

data=dict(re.findall('(Spec \d+ (?:Included|Label))\s*:\s*(\S+)',data))

data will be as follows

{'Spec 3 Included': 'FALSE', 'Spec 2 Included': 'TRUE', 'Spec 3 Label': '4-1-Bead1-HS3', 'Spec 2 Label': '19-Flat2-HS3'}

Then you can convert it by using any of your favoriate xml parser, I will use minidom here.

from xml.dom import minidom

dom = minidom.parseString(xml_text)
params=dom.getElementsByTagName("Parameter")
for param in params:
    name=param.getAttribute("name")
    if name in data:
        for item in param.getElementsByTagName("*"): # You may change to "Result" or "Value" only
            item.firstChild.replaceWholeText(data[name])

print dom.toxml()

#write to file
open("output.xml","wb").write(dom.toxml())

Results

<?xml version="1.0" ?><ParameterData>
  <CreationInfo date="10/28/2009 03:05:14 PM" user="manoj"/>
  <ParameterList count="85">
    <Parameter mode="both" name="Spec 2 Included" type="boolean">
      <Value>TRUE</Value>
      <Result>TRUE</Result>
    </Parameter>
    <Parameter mode="both" name="Spec 2 Label" type="string">
      <Value>19-Flat2-HS3</Value>
      <Result>19-Flat2-HS3</Result>
    </Parameter>
    <Parameter mode="both" name="Spec 3 Included" type="boolean">
      <Value>FALSE</Value>
      <Result>FALSE</Result>
    </Parameter>
    <Parameter mode="both" name="Spec 3 Label" type="string">
      <Value>4-1-Bead1-HS3</Value>
      <Result>4-1-Bead1-HS3</Result>
    </Parameter>
  </ParameterList>
</ParameterData>

Upvotes: 6

Uche Ogbuji
Uche Ogbuji

Reputation: 11

Here is how you could do it using Amara

from amara import bindery

doc = bindery.parse(XML)

def cleanup_for_dict(key, value):
    return key.strip(), value.strip()

params = dict(( cleanup_for_dict(*line.split(':', 1))
                for line in TEXT.splitlines()))

for param in doc.ParameterData.ParameterList.Parameter:
    if param.name in params:
        param.Value = params[param.name]
        param.Result = params[param.name]

doc.xml_write()

Upvotes: 1

wierob
wierob

Reputation: 4359

Unfortunately, the XPath supported by ElementTree isn't complete. Since Python 2.6 includes an older version, finding elements by attribute (as stated here) does not work. So Python's own documentation should be your first stop: xml.etree.ElementTree

import xml.etree.ElementTree as ET

original = ET.parse("original.xml")
parameters = original.findall(".//Parameter")
changes = {}

# read changes
with open("changes.txt", "rb") as in_file:
    for change in in_file:
        change = change.rstrip()                # remove line endings
        name, value = change.split(":")
        changes[name.strip()] = value.strip()   # remove whitespaces

# find paramter element and apply changes
for parameter in parameters:
    parameter_name = parameter.get("name")
    if changes.has_key(parameter_name):                
        value = parameter.find("./Value")
        value.text = changes[parameter_name]
        result = parameter.find("./Result")
        result.text = changes[parameter_name]

original.write("new.xml")

Upvotes: 1

Jason Orendorff
Jason Orendorff

Reputation: 45086

Well, you could start with

import xml.etree.ElementTree as ET
tree = ET.parse("blah.xml")

Find the elements you want to modify.

To replace the contents of an element, just do

element.text = "TRUE"

The import statement above works in Python 2.5 or later. If you have an older version of Python you'll need to install ElementTree as an extension, and then the import statement is different: import elementtree.ElementTree as ET.

Upvotes: 5

Related Questions