Parsing all the text inside a tag using lxml in python

Question

I am trying to parse an HTML file which kind of is as shown below


  
    
      hi
      " hello "
      world!
    
  
  
    
      abc
      " def ghijkl "
      mno
      " pqr!"

I tried to parse using the following code

tree = html.fromstring(code.content)
sol = tree.xpath('//ol//text()')
for x in sol:
    print x

I get the result as this

hi
 hello 
world!
abc
 def ghijkl
mno
 pqr!

What can I do to get all the text in each

tag in one line. i.e. I want the output to be

hi hello world!
abc def ghijkl mno pqr!

Nehal J Wani · Accepted Answer

$ cat a.py
from lxml import etree

xml = """
  
    
      hi
      " hello "
      world!
    
  
  
    
      abc
      " def ghijkl "
      mno
      " pqr!"
    
  
"""

tree = etree.fromstring(xml)
sol = tree.xpath('//ol//li')
for a in sol:
   print " ".join([t.strip() for t in a.itertext()]).strip()

$ python a.py
hi " hello " world!
abc " def ghijkl " mno " pqr!"

Parsing all the text inside a tag using lxml in python

Answers (2)

Related Questions