MLSC
MLSC

Reputation: 5972

Parsing data for xml file in python

I have the following xml file:

<address addr="x.x.x.x" addrtype="ipv4"/>
<hostnames>
</hostnames>
<ports><port protocol="tcp" portid="1"><state state="open" reason="syn-ack" reason_ttl="0"/><service name="tcpmux" method="table" conf="3"/></port>
<port protocol="tcp" portid="64623"><state state="open" reason="syn-ack" reason_ttl="0"/><service name="unknown" method="table" conf="3"/></port>
</ports>
<times srtt="621179" rttvar="35357" to="762607"/>
</host>
<host starttime="1418707433" endtime="1418707742"><status state="up" reason="syn-ack" reason_ttl="0"/>
<address addr="y.y.y.y" addrtype="ipv4"/>
<hostnames>
</hostnames>
<ports><port protocol="tcp" portid="1"><state state="open" reason="syn-ack" reason_ttl="0"/><service name="tcpmux" method="table" conf="3"/></port>
<port protocol="tcp" portid="64680"><state state="open" reason="syn-ack" reason_ttl="0"/><service name="unknown" method="table" conf="3"/></port>
</ports>
<times srtt="834906" rttvar="92971" to="1206790"/>
</host>
<host starttime="1418707433" endtime="1418707699"><status state="up" reason="syn-ack" reason_ttl="0"/>
<address addr="w.w.w.w" addrtype="ipv4"/>
<hostnames>
</hostnames>
<ports><extraports state="filtered" count="997">
<extrareasons reason="no-responses" count="997"/>
</extraports>
<port protocol="tcp" portid="25"><state state="open" reason="syn-ack" reason_ttl="0"/><service name="smtp" method="table" conf="3"/></port>
<port protocol="tcp" portid="443"><state state="open" reason="syn-ack" reason_ttl="0"/><service name="https" method="table" conf="3"/></port>
<port protocol="tcp" portid="7443"><state state="open" reason="syn-ack" reason_ttl="0"/><service name="oracleas-https" method="table" conf="3"/></port>
</ports>
<times srtt="690288" rttvar="110249" to="1131284"/>
</host>

What I tried for extracting data for each ip is:

import sys
import xml.etree.ElementTree as ET
input=sys.argv[1]

tree=ET.parse(input)
root=tree.getroot()

for host in root.findall('host'):
    updown=host.find('status').get('state')
    if updown=='up':
        print 'IP Address: '+host.find('address').get('addr')
        ports=[port.get('portid') for port in root.findall('.//port')]
        state=[port.get('state') for port in root.findall('.//port/state')]
        name=[port.get('name') for port in root.findall('.//port/service')]

But it returns me all information of ips. How can I get the specific information for each IP ?

I think I should change the root.findall but I don't know how I can do that.

Upvotes: 2

Views: 648

Answers (3)

StuartLC
StuartLC

Reputation: 107237

By specifying

root.findall('.//port')

You are again starting at the root of the document, hence all ports are returned.

ports=[port.get('portid') for port in host.findall('./ports/port')]

Upvotes: 1

mhawke
mhawke

Reputation: 87054

Within the loop just change root.findall() to host.findall():

for host in root.findall('host'):
    updown=host.find('status').get('state')
    if updown=='up':
        print 'IP Address: '+host.find('address').get('addr')
        ports=[port.get('portid') for port in host.findall('.//port')]
        state=[port.get('state') for port in host.findall('.//port/state')]
        name=[port.get('name') for port in host.findall('.//port/service')]

This will limit finding ports, states and names to those within each host, rather than those withing the whole XML document.

Upvotes: 2

Dmitry Ilukhin
Dmitry Ilukhin

Reputation: 498

For me, this code seems suspiсious:

        ports=[port.get('portid') for port in root.findall('.//port')]
        state=[port.get('state') for port in root.findall('.//port/state')]
        name=[port.get('name') for port in root.findall('.//port/service')]

Inside of loop, you searching entire root node for './/port...' stuff.
It seems you need this:

        ports=[port.get('portid') for port in host.findall('.//port')]
        state=[port.get('state') for port in host.findall('.//port/state')]
        name=[port.get('name') for port in host.findall('.//port/service')]

Upvotes: 1

Related Questions