Surya Gupta
Surya Gupta

Reputation: 117

remove xml tags with lxml

I want to remove the xml tags from the file "new.xml" and put the data according to the print statement.

I have tried with:

    from lxml import etree

    tree = etree.parse("C:\\Users\\name\\Desktop\\new.xml")
    root = tree.getroot()      
    for text in root.iter():
      print text.text

XML code is:

<connection>
<rhel>

<runscript>y</runscript>
<username>useranme</username>
<password>passw</password>
<store>None</store>
<port>2</port>
<host>192.168.73.56</host>
<logdirectory>logs</logdirectory>
</rhel>

</connection>

I got the following output as:

yes
username
passw
None
2
192.168.73.56
logs

But I want to print it as:

is it a new connection: yes
username: username
password: passw
value: none
connections: 2
host: 192.168.73.56
log dir : logs

Upvotes: 0

Views: 738

Answers (1)

jadkik94
jadkik94

Reputation: 7078

You need to parse according to the structure of the XML file. For this you can make a loop through the children, and see the tag name and text of each.

from lxml import etree

tree = etree.parse("test.xml")
root = tree.getroot()

connections = []
for node in root.findall('rhel'): # for all the 'rhel' nodes, children of the root 'connections' node
    connections.append({info.tag: info.text for info in node}) # Construct a dictionary with the (tag, text) as (key, value) pair.

print connections

for conn in connections:
    print '='*20
    print """is it a new connection: {runscript}
username: {username}
password: {password}
value: {store}
connections: {port}
host: {host}
log dir : {logdirectory}""".format(**conn)

The way you were doing it, you could try: repr(root). You'll get what is being printed. But it is not recommended, for many reasons:

  1. The output is not guaranteed to be in the order you have it now.
  2. This is not the structure of the XML file.
  3. There are lots of blank lines, and it is expected to be like that.
  4. That's just not how you parse XML :)

Hope it helps.

Update:

You can use connections.append(dict((info.tag, info.text) for info in node)) instead of the other line for Python<2.7 . This notation was not supported before that I guess.

Or, ultimately, you can do it as such:

c = {}
for info in node:
    c[info.tag] = info.text
connections.append(c)

Also if on Python 2.6, I guess the format might not work too. Replace it with this, the old string formatting:

    print """is it a new connection: %(runscript)s
username: %(username)s
password: %(password)s
value: %(store)s
connections: %(port)s
host: %(host)s
log dir : %(logdirectory)s""" % conn

Upvotes: 1

Related Questions