Gaberocksall
Gaberocksall

Reputation: 835

Python 3 get child elements (lxml)

I am using lxml with html:

from lxml import html
import requests

How would I check if any of an element's children have the class = "nearby" my code (essentially):

url = "www.example.com"
Page = requests.get(url)
Tree = html.fromstring(Page.content)
resultList = Tree.xpath('//p[@class="result-info"]')
i=len(resultList)-1 #to go though the list backwards
while i>0:
    if (resultList[i].HasChildWithClass("nearby")):
        print('This result has a child with the class "nearby"')

How would I replace "HasChildWithClass()" to make it actually work?

Here's an example tree:

...
    <p class="result-info">
        <span class="result-meta">
            <span class="nearby">
                ... #this SHOULD print something
            </span>
        </span>
    </p>
    <p class="result-info">
        <span class="result-meta">
            <span class="FAR-AWAY">
                ... # this should NOT print anything
            </span>
        </span>
    </p>
...

Upvotes: 3

Views: 4911

Answers (2)

Zheng Liu
Zheng Liu

Reputation: 302

Here is an experiment I did.

Take r = resultList[0] in python shell and type:

>>> dir(r)
['__bool__', '__class__', ..., 'find_class', ...

Now this find_class method is highly suspicious. If you check its help doc:

>>> help(r.find_class)

you'll confirm the guess. Indeed,

>>> r.find_class('nearby')
[<Element span at 0x109788ea8>]

For the other tag s = resultList[1] in the example xml code you gave,

>>> s.find_class('nearby')
[]

Now it's clear how to tell whether a 'nearby' child exists or not.

Cheers!

Upvotes: 0

KC.
KC.

Reputation: 3107

I tried to understand why you use lxml to find the element. However BeautifulSoup and re may be a better choice.

lxml = """
    <p class="result-info">
        <span class="result-meta">
            <span class="nearby">
                ... #this SHOULD print something
            </span>
        </span>
    </p>
    <p class="result-info">
        <span class="result-meta">
            <span class="FAR-AWAY">
                ... # this should NOT print anything
            </span>
        </span>
    </p>
    """

But i done what you want.

from lxml import html

Tree = html.fromstring(lxml)
resultList = Tree.xpath('//p[@class="result-info"]')
i = len(resultList) - 1 #to go though the list backwards
for result in resultList:
    for e in result.iter():
        if e.attrib.get("class") == "nearby":
            print(e.text)

Try to use bs4

from bs4 import BeautifulSoup


soup = BeautifulSoup(lxml,"lxml")
result = soup.find_all("span", class_="nearby")
print(result[0].text)

Upvotes: 1

Related Questions