Nmayo
Nmayo

Reputation: 13

Parsing out data in an XML file using Python

I have an xml file where I need to strip out xml tags where if possible I can use a wild card because the data within the tags will be different information. See xml below:

 <relationship relation="1">
        <sourcedid>
            <source>xxxxx</source>
            <id>AbDT-1398</id>  ***this data will be different for each grouping****
        </sourcedid>
        <label/>
    </relationship>

Basically I need to search the xml file for the grouping and have a wild card character within the tags and remove the entire grouping. Throughout my xml the tag is listed but the data is what changes.

Upvotes: 1

Views: 474

Answers (1)

Aufwind
Aufwind

Reputation: 26278

If I got you right, you want to remove certain tags (and eventually their contents) from your xml file. Try using lxml for processing the lxml file. Have a look at these functions from lxml.etree.

Delete all elements with the provided tag names from a tree or subtree. This will remove the elements and their entire subtree, including all their attributes, text content and descendants.

This will remove the elements and their attributes, but not their text/tail content or descendants. Instead, it will merge the text content and children of the element into its parent.

Is this what you are looking for? If yes, there is nice answer on SO you should have a look at.

Upvotes: 2

Related Questions