Anurag Sharma
Anurag Sharma

Reputation: 5049

get list items inside div tag using xpath

I have a html like this

<div id="all-stories" class="book"> 
<ul>

<li title="Book1"  ><a href="book1_url">Book1</a></li>

<li title="Book2"  ><a href="book2_url">Book2</a></li>
</ul>

</div>

I want to get the books and their respective url using xpath, but it seems my approach is not working. for simplicity i tried to extract all the elements under "li " tags as follows

lis = tree.xpath('//div[@id="all-stories"]/div/text()')

Upvotes: 9

Views: 28321

Answers (1)

unutbu
unutbu

Reputation: 880777

import lxml.html as LH

content = '''\
<div id="all-stories" class="book"> 
<ul>

<li title="Book1"  ><a href="book1_url">Book1</a></li>

<li title="Book2"  ><a href="book2_url">Book2</a></li>
</ul>

</div>
'''
root = LH.fromstring(content)
for atag in root.xpath('//div[@id="all-stories"]//li/a'):
    print(atag.attrib['href'], atag.text_content())

yields

('book1_url', 'Book1')
('book2_url', 'Book2')

The XPath //div[@id="all-stories"]/div does not match anything because there is no child div inside the outer div tag.

The XPath //div[@id="all-stories"]/li also would not match because the there is no direct child li tage inside the div tag. However, //div[@id="all-stories"]//li does match li tags because // tells XPath to recursively search as deeply as necessary to find the li tags.

Now, the content you are looking for is not in the li tag. It is inside the a tag. So instead use the XPath '//div[@id="all-stories"]//li/a' to reach the a tags. The value of the href attribute can be accessed with atag.attrib['href'], and the text with atag.text_content().

Upvotes: 9

Related Questions