Reputation: 6319
I'm generating an XML file using Python and markup.py ....it was all working out but due to recent changes in the script, I'm now getting duplicated values in the nodes due to the checks I put in place. Here's a sample of the output (they are vehicle records):
<?xml version='1.0' encoding='UTF-8' ?>
<datafeed>
<vehicle>
<vin>2HNYD18816H532105</vin>
<features>
<feature>AM/FM Radio</feature>
<feature>Air Conditioning</feature>
<feature>Anti-Lock Brakes (ABS)</feature>
<feature>Alarm</feature>
<feature>CD Player</feature>
<feature>Air Bags</feature>
<feature>Air Bags</feature>
<feature>Anti-Lock Brakes (ABS)</feature>
<feature>Alarm</feature>
<feature>Air Bags</feature>
<feature>Alarm</feature>
<feature>Air Bags</feature>
</features>
</vehicle>
<vehicle>
<vin>2HKYF18746H537006</vin>
<features>
<feature>AM/FM Radio</feature>
<feature>Anti-Lock Brakes (ABS)</feature>
<feature>Air Bags</feature>
<feature>Air Bags</feature>
<feature>Anti-Lock Brakes (ABS)</feature>
<feature>Alarm</feature>
<feature>Air Bags</feature>
<feature>Alarm</feature>
</features>
</vehicle>
</datafeed>
This is a small excerpt from a larger XML file having over 100 records. What can I do to remove the duplicate nodes?
Upvotes: 1
Views: 1901
Reputation: 388023
There are no real "duplicates" in XML. Every node is different by definition. But I understand you that you want to get rid of all duplicate features in your interpretion.
You can do this by simply parsing that tree, putting the features (the values of the nodes) in a set (to get rid of duplicates) and writing out a new XML document.
Given that you are generating the file with Python, you should modify the creation routine the way that it doesn't generate duplicate values to begin with. You might want to tell us what the markup.py
is or does.
I just took a quick look at the markup script, so something like this might appear in your script:
// well, this might come from somewhere else, but I guess you have such a list somewhere
features = [ 'AM/FM Radio', 'Air Conditioning', 'Anti-Lock Brakes (ABS)', 'Alarm', 'CD Player', 'Air Bags', 'Air Bags', 'Anti-Lock Brakes (ABS)', 'Alarm', 'Air Bags', 'Alarm', 'Air Bags' ]
// write the XML
markup.features.open()
markup.feature( features )
markup.features.close()
In this case, just make features a set
before passing it to the markup script:
// write the XML
markup.features.open()
markup.feature( set( features ) )
markup.features.close()
If you have multiple separate lists that contain your features for a single vehicle, combine those lists (or sets) first:
list1 = [...]
list2 = [...]
list3 = [...]
features = set( list1 + list2 + list3 )
Upvotes: 1