Devanshu Misra
Devanshu Misra

Reputation: 823

How to parse and fetch only the desired XML elements from an XML file using python?

I have an XML file which looks like this:

<rpc-reply xmlns:junos="http://xml.juniper.net/junos/15.1R5/junos">
    <vlan-information xmlns="http://xml.juniper.net/junos/15.1R5/junos-esw" junos:style="brief">
        <vlan-terse/>
        <vlan>
            <vlan-instance>0</vlan-instance>
            <vlan-name>ACRS-Dev2</vlan-name>
            <vlan-create-time>Fri Jan  1 00:37:59 2010
            </vlan-create-time>
            <vlan-status>Enabled</vlan-status>
            <vlan-owner>static</vlan-owner>
            <vlan-tag>0</vlan-tag>
            <vlan-index>2</vlan-index>
            <vlan-l3-interface>vlan.15 (UP)</vlan-l3-interface>
            <vlan-l3-interface-address>10.8.25.1/24</vlan-l3-interface-address>
            <vlan-protocol-port>Port Mode</vlan-protocol-port>
            <vlan-members-count>7</vlan-members-count>
            <vlan-members-upcount>6</vlan-members-upcount>
        </vlan>
        <vlan>
            <vlan-instance>0</vlan-instance>
            <vlan-name>default</vlan-name>
            <vlan-create-time>Fri Jan  1 00:37:59 2010
            </vlan-create-time>
            <vlan-status>Enabled</vlan-status>
            <vlan-owner>static</vlan-owner>
            <vlan-tag>0</vlan-tag>
            <vlan-index>3</vlan-index>
            <vlan-l3-interface>vlan.11 (UP)</vlan-l3-interface>
            <vlan-l3-interface-address>10.8.27.1/24</vlan-l3-interface-address>
            <vlan-protocol-port>Port Mode</vlan-protocol-port>
            <vlan-members-count>12</vlan-members-count>
            <vlan-members-upcount>2</vlan-members-upcount>
        </vlan>
    </vlan-information>
</rpc-reply>

From this, I only want the <vlan-name> and <vlan-l3-interface-address> tags which are to be parsed and saved in a dict/json like variable with it's format being:

{'Vlan-Name' : vlan_name, 'Interface-Address' : interface_addr}

and then add these dict/json for each element in a list of dicts/json. This is my code for parsing and insertion of the json in list:

root = tree.getroot()
nw_pool = []
nw_json = {}
for child in root:
    for items in child:
        for item1 in items:
            if 'vlan-l3-interface-address' in item1.tag:
                interface_addr = item1.text
                nw_json['Interface-Address'] = interface_addr
            elif 'vlan-name' in item1.tag:
                vlan_name = item1.text
                nw_json['Vlan-Name'] = vlan_name
                nw_pool.append(nw_json)
print(nw_pool)

But when I print the nw_pool, it gives me an output where the json of the last element found is repeated instead of giving me distinct dicts for each element.

Output:

[{'Vlan-Name': 'default', 'Interface-Address': '10.8.27.1/24'}, {'Vlan-Name': 'default', 'Interface-Address': '10.8.27.1/24'}]

Whereas my desired output is:

[{'Vlan-Name': 'ACRS-Dev2', 'Interface-Address': '10.8.25.1/24'}, {'Vlan-Name': 'default', 'Interface-Address': '10.8.27.1/24'}] 

Can somebody help me with this? Thanks in advance.

Upvotes: 1

Views: 190

Answers (2)

PythonSherpa
PythonSherpa

Reputation: 2600

You are overwriting the existing dict, while you need a new one for every iteration. So, you need to put nw_json = {} in another place:

root = tree.getroot()
nw_pool = []
for child in root:
    for items in child:
        nw_json = {}   # Work with new dict
        for item1 in items:
            if 'vlan-l3-interface-address' in item1.tag:
                interface_addr = item1.text
                nw_json['Interface-Address'] = interface_addr
            elif 'vlan-name' in item1.tag:
                vlan_name = item1.text
                nw_json['Vlan-Name'] = vlan_name
                nw_pool.append(nw_json)
print(nw_pool)

Upvotes: 1

SanthoshSolomon
SanthoshSolomon

Reputation: 1402

The problem in your code is you have initiated the dict() object prior to the loop so the data has been overwritten in the flow.

@Hoenie's answer gives clarity about your mistake.

Adding to that, I would suggest you to try BeautifulSoup for parsing XML as it is simple and easy to understand. Try the below code.

from bs4 import BeautifulSoup

fileObj = open('test.xml').read()
soup = BeautifulSoup(fileObj, 'lxml')
vlans = soup.findAll('vlan')
nw_pool = []
for vlan in vlans:
    nw_json = dict()
    nw_json['Interface-Address'] = vlan.find('vlan-l3-interface-address').text
    nw_json['Vlan-Names'] = vlan.find('vlan-name').text
    nw_pool.append(nw_json)
print(nw_pool) # O/P [{'Interface-Address': '10.8.25.1/24', 'Vlan-Names': 'ACRS-Dev2'}, {'Interface-Address': '10.8.27.1/24', 'Vlan-Names': 'default'}]

Cheers!

Upvotes: 1

Related Questions