user3319356
user3319356

Reputation: 173

How to extract part of xml file

I have big xml file, that looks like the one below. Basicily I want to extract part of xml file which has for example this "<ManagedElementId string = "rbs064841"/>".

 <Model version = "1" importVersion = "12.2">
        <Create>
            <SubNetwork networkType = "WRAN" userLabel="AHPTUR14">
                <ManagedElement sourceType = "CELLO">
                    <ManagedElementId string = "rbs064841"/>
                    <primaryType type = "RBS"/>
                    <managedElementType types = ""/>
                    <associatedSite string = "Site=site06484"/>
                    <nodeVersion string = "W12B"/>
                    <platformVersion string = "Cello 12.2"/>
                    <swVersion string = ""/>
                    <vendorName string = "ERICSSON"/>
                    <userDefinedState string = ""/>
                    <managedServiceAvailability int = "1"/>
                    <isManaged boolean = "true"/>
                    <neMIMVersion string = "vS.1.150"/>
                    <connectionStatus string = "ON"/>
                </ManagedElement>
            </SubNetwork>
             <SubNetwork networkType = "WRAN" userLabel = "AHPT78">
                <ManagedElement sourceType = "CELLO">
                    <ManagedElementId string = "rbs04798"/>
                    <primaryType type = "RBS"/>
                    <managedElementType types = ""/>
                    <associatedSite string = "Site=site06484"/>
                    <nodeVersion string = "W12B"/>
                    <platformVersion string = "Cello 12.2"/>
                    <swVersion string = ""/>
                    <vendorName string = "ERICSSON"/>
                    <userDefinedState string = ""/>
                    <managedServiceAvailability int = "1"/>
                    <isManaged boolean = "true"/>
                    <neMIMVersion string = "vS.1.150"/>
                    <connectionStatus string = "ON"/>
                </ManagedElement>
            </SubNetwork>
            <SubNetwork networkType = "WRAN" userLabel = "AHPT4">
                <ManagedElement sourceType = "CELLO">
                    <ManagedElementId string = "rbs04456"/>
                    <primaryType type = "RBS"/>
                    <managedElementType types = ""/>
                    <associatedSite string = "Site=site06484"/>
                    <nodeVersion string = "W12B"/>
                    <platformVersion string = "Cello 12.2"/>
                    <swVersion string = ""/>
                    <vendorName string = "ERICSSON"/>
                    <userDefinedState string = ""/>
                    <managedServiceAvailability int = "1"/>
                    <isManaged boolean = "true"/>
                    <neMIMVersion string = "vS.1.150"/>
                    <connectionStatus string = "ON"/>
                </ManagedElement>
            </SubNetwork>
        </Create>
    </Model>

which means after parsing I wnat to extract this part:

<SubNetwork networkType = "WRAN" userLabel="AHPTUR14">
            <ManagedElement sourceType = "CELLO">
                <ManagedElementId string = "rbs064841"/>
                <primaryType type = "RBS"/>
                <managedElementType types = ""/>
                <associatedSite string = "Site=site06484"/>
                <nodeVersion string = "W12B"/>
                <platformVersion string = "Cello 12.2"/>
                <swVersion string = ""/>
                <vendorName string = "ERICSSON"/>
                <userDefinedState string = ""/>
                <managedServiceAvailability int = "1"/>
                <isManaged boolean = "true"/>
                <neMIMVersion string = "vS.1.150"/>
                <connectionStatus string = "ON"/>
            </ManagedElement>
        </SubNetwork>

so make a search in big xml file by ManagedElementId, and when found extract part of it under which it was found, meaning from <SubNetwork> to </SubNetwork>. I know how to extract data from xml file, but I don't know how can I extract part of xml.file. I'm using python ElementTree. Any advise would be helpful.

Upvotes: 1

Views: 4185

Answers (1)

Anzel
Anzel

Reputation: 20553

Use find with path, then get the relative parent node, like this:

s = '''<Model version = "1" importVersion = "12.2">
        <Create>
            <SubNetwork networkType = "WRAN" userLabel="AHPTUR14">
                <ManagedElement sourceType = "CELLO">
                    <ManagedElementId string = "rbs064841"/>
                    <primaryType type = "RBS"/>
                    <managedElementType types = ""/>
                    <associatedSite string = "Site=site06484"/>
                    <nodeVersion string = "W12B"/>
                    <platformVersion string = "Cello 12.2"/>
                    <swVersion string = ""/>
                    <vendorName string = "ERICSSON"/>
                    <userDefinedState string = ""/>
                    <managedServiceAvailability int = "1"/>
                    <isManaged boolean = "true"/>
                    <neMIMVersion string = "vS.1.150"/>
                    <connectionStatus string = "ON"/>
                </ManagedElement>
            </SubNetwork>
             <SubNetwork networkType = "WRAN" userLabel = "AHPT78">
                <ManagedElement sourceType = "CELLO">
                    <ManagedElementId string = "rbs04798"/>
                    <primaryType type = "RBS"/>
                    <managedElementType types = ""/>
                    <associatedSite string = "Site=site06484"/>
                    <nodeVersion string = "W12B"/>
                    <platformVersion string = "Cello 12.2"/>
                    <swVersion string = ""/>
                    <vendorName string = "ERICSSON"/>
                    <userDefinedState string = ""/>
                    <managedServiceAvailability int = "1"/>
                    <isManaged boolean = "true"/>
                    <neMIMVersion string = "vS.1.150"/>
                    <connectionStatus string = "ON"/>
                </ManagedElement>
            </SubNetwork>
            <SubNetwork networkType = "WRAN" userLabel = "AHPT4">
                <ManagedElement sourceType = "CELLO">
                    <ManagedElementId string = "rbs04456"/>
                    <primaryType type = "RBS"/>
                    <managedElementType types = ""/>
                    <associatedSite string = "Site=site06484"/>
                    <nodeVersion string = "W12B"/>
                    <platformVersion string = "Cello 12.2"/>
                    <swVersion string = ""/>
                    <vendorName string = "ERICSSON"/>
                    <userDefinedState string = ""/>
                    <managedServiceAvailability int = "1"/>
                    <isManaged boolean = "true"/>
                    <neMIMVersion string = "vS.1.150"/>
                    <connectionStatus string = "ON"/>
                </ManagedElement>
            </SubNetwork>
        </Create>
    </Model>'''

# I'd prefer lxml, but you need to work on xml module...
import xml.etree.ElementTree as ET
tree = ET.fromstring(s)

# since the SubNetwork node you're interested is the parent of parent of ManagedElementId
node = tree.find('.//ManagedElementId[@string="rbs064841"]/../../../')

print ET.tostring(node)
<SubNetwork networkType="WRAN" userLabel="AHPTUR14">
                <ManagedElement sourceType="CELLO">
                    <ManagedElementId string="rbs064841"/>
                    <primaryType type="RBS"/>
                    <managedElementType types=""/>
                    <associatedSite string="Site=site06484"/>
                    <nodeVersion string="W12B"/>
                    <platformVersion string="Cello 12.2"/>
                    <swVersion string=""/>
                    <vendorName string="ERICSSON"/>
                    <userDefinedState string=""/>
                    <managedServiceAvailability int="1"/>
                    <isManaged boolean="true"/>
                    <neMIMVersion string="vS.1.150"/>
                    <connectionStatus string="ON"/>
                </ManagedElement>
            </SubNetwork>

If you are parsing from a file, use getroot():

root = ET.parse('file.xml')
tree = root.getroot()
...

Hope this helps.

Upvotes: 1

Related Questions