Frankie
Frankie

Reputation: 804

How to use get the data using lxml

I would like to scrape the file of a page:

<body class="body_class" style="background:#444;">
<div class="data" id="id">
<div id="images" style="cursor: auto;">
<img id="page-1" src="image1.jpg" data-index="1" style="" data-bd-imgshare binded="1">
<p class="img_info">(1/14)</p>
</div>
</div>
</body>

I would like to get the data image1.jpg.

I tried the code

from lxml import html
import requests
page = requests.get(r'http://example.com')
tree = html.fromstring(page.content)
a = tree.xpath('//div[@id="images"]/src/text()')

It fails. How to get the data?

Thanks.

Upvotes: 0

Views: 27

Answers (1)

ruhaib
ruhaib

Reputation: 649

Are you looking for the text "image1.jpg" as data ? if so, then simply use this xpath //div[@id="images"]//@src.

and if you want the image downloaded using the address in the src you can use.

import urllib
urllib.urlretrieve("http://www.gunnerkrigg.com//comics/00000001.jpg", 
"00000001.jpg")

Upvotes: 1

Related Questions