Reputation: 1694
I want to extract a part of html from the following html with python xpath. my question just want to extract the html part include tag and text, and this Get all text inside a tag in lxml question is to extract text part of html, so these two questions is different.
<html>
<body>
<div class ="item">
<ul>
<li class="item-0"><a href="link1.html">first item</a></li>
<li class="item-1"><a href="link2.html">second item</a></li>
<li class="item-inactive"><a href="link3.html">third item</a> </li>
<li class="item-1"><a href="link4.html">fourth item</a></li>
<li class="item-0"><a href="link5.html">fifth item</a></li>
</ul>
</div>
<div class = "movie">
<div title = "name">
<ul>[url=http://]
<li class="item-0"><a href="link1.html">movie a</a></li>
<li class="item-1"><a href="link2.html">movie b</a></li>
<li class="item-inactive"><a href="link3.html">movie c</a></li>
<li class="item-1"><a href="link4.html">movie d</a></li>
</ul>
</div>
</div>
</body>
</html>
Actually, I just want to extract the following html from the above html.
<div title = "name">
<ul>
<li class="item-0"><a href="link1.html">movie a</a></li>
<li class="item-1"><a href="link2.html">movie b</a></li>
<li class="item-inactive"><a href="link3.html">movie c</a></li>
<li class="item-1"><a href="link4.html">movie d</a></li>
</ul>
</div>
My code imports requests
page = requests.get('........html')
tree = html.fromstring(page.content)
body = tree.xpath('//div[contains(@title, "name")]')
print('body:', body)
but the result is
<Element div at 0x103620e58>
I want to get all the elements in this part html, for example
<ul> <li> .
please use the xpath method not other method.
Upvotes: 0
Views: 2134
Reputation: 9627
I want to get all the elements in this part html, for example <ul> <li>
Try to use:
body = tree.xpath('//div[contains(@title, "name")]/ul')
or:
Update:(Thanks to @RafaelAlmeida) for all elements blow the div
body = tree.xpath('//div[contains(@title, "name")]//*')
Upvotes: 2