BeautifulSoup/LXML.html: delete tag and its children if child looks like x

Question

I have a problem getting to the right solution. I want to delete and its children if is = 99. As a result, I need a string with the filtered questions. I have the following html structure:


         
  
   
    
     Do I have a question?
    
    
     99
    
   
   
    
     Do I love HTML/XML parsing?
    
    
     
      1 oh god yeah
     
     
      2 that makes me feel good
     
     
      3 oh hmm noo
     
     
      4 totally
     
     
     
      4

So far i tried to realize it with xpath...but lxml.html has no iterparse...has it? Thanx!

Matt Williamson · Accepted Answer

This will do exactly what you need:

from xml.dom import minidom

doc = minidom.parseString(text)
for question in doc.getElementsByTagName('question'):
    for answer in question.getElementsByTagName('answer'):
        if answer.childNodes[0].nodeValue.strip() == '99':
            question.parentNode.removeChild(question)

print doc.toxml()

Result:


         
  

   
    
     Do I love HTML/XML parsing?
    
    
     
      1 oh god yeah
     
     
      2 that makes me feel good
     
     
      3 oh hmm noo
     
     
      4 totally
     
     
     
      4

BeautifulSoup/LXML.html: delete tag and its children if child looks like x

Answers (2)

Related Questions