Reputation: 804
I would like to scrape the file of a page:
<body class="body_class" style="background:#444;">
<div class="data" id="id">
<div id="images" style="cursor: auto;">
<img id="page-1" src="image1.jpg" data-index="1" style="" data-bd-imgshare binded="1">
<p class="img_info">(1/14)</p>
</div>
</div>
</body>
I would like to get the data image1.jpg
.
I tried the code
from lxml import html
import requests
page = requests.get(r'http://example.com')
tree = html.fromstring(page.content)
a = tree.xpath('//div[@id="images"]/src/text()')
It fails. How to get the data?
Thanks.
Upvotes: 0
Views: 27
Reputation: 649
Are you looking for the text "image1.jpg" as data ?
if so, then simply use this xpath //div[@id="images"]//@src
.
and if you want the image downloaded using the address in the src you can use.
import urllib
urllib.urlretrieve("http://www.gunnerkrigg.com//comics/00000001.jpg",
"00000001.jpg")
Upvotes: 1