Anton Barycheuski
Anton Barycheuski

Reputation: 720

How to convert XPath Element to plain html text?

I have page:

<body>
  <div>
    <a id="123">text_url</a>
  </div>    
<body>

And I want to get element '//div/a' as plain html text.

<a id="123">text_url</a>

How can I do it?

Upvotes: 2

Views: 7185

Answers (4)

Robᵩ
Robᵩ

Reputation: 168646

If you have already parsed the object using lxml, you can serialize it with lxml.etree.tostring():

from lxml import etree
xml='''<body>
  <div>
    <a id="123">text_url</a>
  </div>    
</body>'''

root = etree.fromstring(xml)
for a in root.xpath('//div/a'):
  print etree.tostring(a, method='html', with_tail=False)

Upvotes: 2

Saish
Saish

Reputation: 521

You could use the xml library in Python.

from xml.etree.ElementTree import parse

doc = parse('page.xml') # assuming page.xml is on disk
print doc.find('div/a[@id="123"]').text

Note that this would only work for strict XML. For example, you closing body tag is incorrect and this code would fail in that case. HTML on the web is rarely strict XML.

Upvotes: 0

vks
vks

Reputation: 67968

You can use re module of python with re.findall.

import re
print re.findall(r".*?(<a.*?<\/a>).*",x,re.DOTALL)

where x is x=""" text_url """

Output:['<a id="123">text_url</a>']

See demo as well.

http://regex101.com/r/lF4lY6/1

Upvotes: 0

Anton Barycheuski
Anton Barycheuski

Reputation: 720

Working solution in python with grab module.

from grab import Grab

g = Grab()
g.go('file://page.htm')
print g.doc.select('//div/a')[0].html()

>><a id="123">text_url</a>

Upvotes: 0

Related Questions