Andu Gundu Swami
Andu Gundu Swami

Reputation: 35

Tag value not printing etree lxml

I want to print the "Printable String" part of the code. Also I tried to print the whole tag itself but didn't find a way to print the whole tag instead of just the tag name. Retrieving Xpath and the whole tag itself is the biggest challenge currently, Thank You!

Code:

from bs4 import BeautifulSoup
from lxml import etree

doc = "<p><a></a><a></a>Printable String</p>"
soup = BeautifulSoup(doc, "lxml")
root = etree.fromstring(str(soup))

tree = etree.ElementTree(root)
for i, e in enumerate(root.iter()):
    print(e.text)

Output:

None
None
None
None
None
[Finished in 0.2s]

Expected Output:

None 
None
Printable String
None 
None

Upvotes: 2

Views: 95

Answers (2)

Jack Fleeting
Jack Fleeting

Reputation: 24928

A couple of things to notice:

First, for some reason you parse doc first with soup and then again parse the string of soup with lxml. The first problem is that BS doesn't leave the string along. If you

print(soup)

the output is

<html><body><p><a></a><a></a>Printable String</p></body></html>

You will notice two new elements (html and body) are now added, which explains why you get five Nones instead of only three.

If you parse doc directly with lxml like so and use xpath:

doc = "<p><a></a><a></a>Printable String</p>"
root = etree.fromstring(doc)
for z in root.xpath('//*'):
    print(z.xpath('text()'))

Output is

['Printable String']
[]
[]

Upvotes: 1

user2668284
user2668284

Reputation:

It's as simple as:-

from bs4 import BeautifulSoup

doc = "<p><a></a><a></a>Printable String</p>"
soup = BeautifulSoup(doc, "lxml")
print(soup.find('p').text)

...or if you want a pure etree solution then:-

from lxml import etree
from io import StringIO

doc = '<p><a></a><a></a>Printable String</p>'

tree = etree.parse(StringIO(doc))
print(tree.xpath('//p/text()')[0])

Upvotes: 0

Related Questions