Reputation: 10580
I am scraping this website http://www.propertyfinder.ae/en/buy/villa-for-sale-dubai-jumeirah-park-1849328.html?img/0
and I want to get all the image src in this tag div[@id='propertyPhoto']
I tried this xpath
.//div[@id='propertyPhoto']//img/@src
and them I make a loop to extract the src but I got only the first image src
help please
Upvotes: 1
Views: 271
Reputation: 369134
Only the main images is in the div#propertyPhoto
. Others are inside li#propertyPhotoMini0
, li#propertyPhotoMini1
, ...
So XPath should slighly modified to match both. id
attributes of them all starts with propertyPhoto
; you can use following XPath:
.//*[starts-with(@id, 'propertyPhoto')]//img/@src
Example:
import urllib
from scrapy.selector import Selector
url = 'http://www.propertyfinder.ae/en/buy/villa-for-sale-dubai-jumeirah-park-1849328.html?img/0'
h = urllib.urlopen(url).read()
root = Selector(text=h, type='html')
for url in root.xpath(".//*[starts-with(@id, 'propertyPhoto')]//img/@src").extract():
print(url)
output:
http://c1369023.r23.cf3.rackcdn.com/1849328-1-wide.jpg
http://c1369023.r23.cf3.rackcdn.com/1849328-1-mini.jpg
http://c1369023.r23.cf3.rackcdn.com/1849328-2-mini.jpg
http://c1369023.r23.cf3.rackcdn.com/1849328-3-mini.jpg
http://c1369023.r23.cf3.rackcdn.com/1849328-4-mini.jpg
http://c1369023.r23.cf3.rackcdn.com/1849328-5-mini.jpg
http://c1369023.r23.cf3.rackcdn.com/1849328-6-mini.jpg
http://c1369023.r23.cf3.rackcdn.com/1849328-7-mini.jpg
http://c1369023.r23.cf3.rackcdn.com/1849328-8-mini.jpg
http://c1369023.r23.cf3.rackcdn.com/1849328-9-mini.jpg
http://c1369023.r23.cf3.rackcdn.com/1849328-10-mini.jpg
Upvotes: 1