Reputation: 7
I am working on a project where I have annotated images of certain leaves and saved them in xml format for identifying pests on the leaf using object detection. But since I am facing some ambiguity in some objects because some of the pests look similar but in actual sense they are different, I thought of removing one class. And since I have annotated all images, manually removing the labeling is a tedious task so I thought of writing a script to remove those objects in the xml file. The structure of the file is:
<annotation>
<folder>Set 3 A</folder>
<filename>IMG-20200904-WA0105.jpg</filename>
<path>C:\Users\Admin\Desktop\Set 3 A\Set 3 A\IMG-20200904-WA0105.jpg</path>
<source>
<database>Unknown</database>
</source>
<size>
<width>960</width>
<height>1280</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>Whiteflies</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>232</xmin>
<ymin>83</ymin>
<xmax>286</xmax>
<ymax>173</ymax>
</bndbox>
</object>
<object>
<name>Jassid Attack Effect</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>356</xmin>
<ymin>7</ymin>
<xmax>563</xmax>
<ymax>359</ymax>
</bndbox>
</object>
<object>
<name>Jassid Attack Effect</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>356</xmin>
<ymin>7</ymin>
<xmax>563</xmax>
<ymax>359</ymax>
</bndbox>
</object>
<object>
<name>Whiteflies</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>232</xmin>
<ymin>83</ymin>
<xmax>286</xmax>
<ymax>173</ymax>
</bndbox>
</object>
So if I want to remove the object name "Jassid Attack Effect" (it may be present multiple times in a document and all of them have to be removed as shown in the above xml code) and its contents, how will I do that? Like for eg: while parsing, object name is "Jassid Attack Effect", then I want to remove this entirely from the xml file:
<object>
<name>Jassid Attack Effect</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>356</xmin>
<ymin>7</ymin>
<xmax>563</xmax>
<ymax>359</ymax>
</bndbox>
</object>
Upvotes: -1
Views: 242
Reputation: 21
pip install pascal-voc
from pascal import annotation_from_xml
from pascal.utils import save_xml
if __name__ == "__main__":
ann = annotation_from_xml("ann.xml")
ann.filter_objects(["Jassid Attack Effect"])
xml = ann.to_xml()
save_xml("new_ann.xml", xml)
Upvotes: 0
Reputation: 24928
Try something like this:
stuff = r"""your xml above""" #you need the "r" because you have unescaped backslashes; also note that the xml is not well-formed; you left out the closing <annotation> tag
from lxml import etree
doc = etree.XML(stuff)
target = doc.xpath('//object[name["Jassid Attack Effect"]]')[0]
target.getparent().remove(target)
print(etree.tostring(doc).decode())
Output:
<annotation>
<folder>Set 3 A</folder>
<filename>IMG-20200904-WA0105.jpg</filename>
<path>C:\Users\Admin\Desktop\Set 3 A\Set 3 A\IMG-20200904-WA0105.jpg</path>
<source><database>Unknown</database></source>
<size>
<width>960</width>
<height>1280</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>Whiteflies</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>232</xmin>
<ymin>83</ymin>
<xmax>286</xmax>
<ymax>173</ymax>
</bndbox>
</object>
</annotation>
Upvotes: 0