Reputation: 62
I am having an issue getting a next page link with python.
Code
import scrapy
from scrapy.http import Request
from gharbheti.items import GharbhetiItem
from scrapy.contrib.loader import ItemLoader
from scrapy.contrib.loader.processor import TakeFirst, Identity, MapCompose, Join, Compose
from urllib.parse import urljoin
class ListSpider(scrapy.Spider):
name = 'list'
allowed_domains = ['gharbheti.com']
start_urls = ['https://www.gharbheti.com/sale','https://www.gharbheti.com/rent']
def parse(self, response):
properties=response.xpath('//li[@class="col-md-6 Search_building"]/descendant::a')
for property in properties:
link=property.xpath('./@href').extract_first()
urls=response.urljoin(link)
yield Request(urls,callback=self.parse_property, meta={'URL':urls, })
def parse_property(self, response):
l = ItemLoader(item=GharbhetiItem(), response=response)
URL=response.meta.get('URL')
l.add_value('URL', response.url)
l.add_xpath('Title','//div[@class="product-page-meta"]/h4/em/text()',MapCompose(str.strip,str.title))
l.add_xpath('Offering','//figcaption[contains(text(), "For Sale")]/text()|//figcaption[contains(text(),"For Rent")]/text()',MapCompose(lambda i:i.replace('For',''),str.strip))
l.add_xpath('Price','//div[@class="deal-pricebox"]/descendant::h3/text()',MapCompose(str.strip))
l.add_xpath('Type','//ul[@class="suitable-for"]/li/text()',MapCompose(str.strip))
bike_parking=response.xpath('//i[@class="fa fa-motorcycle"]/following-sibling::em/text()').extract_first()
car_parking=response.xpath('//i[@class="fa fa-car"]/following-sibling::em/text()').extract_first()
parking=("Bike Parking: {} Car Parking: {}".format(bike_parking,car_parking))
l.add_value('Parking',parking)
l.add_xpath('Description','//div[@class="comment more"]/text()',MapCompose(str.strip))
l.add_xpath('Bedroom','//i[@class="fa fa-bed"]/following-sibling::text()',MapCompose(lambda i:i.replace('Total Bed Room:',''),str.strip,int))
l.add_xpath('Livingroom','//i[@class="fa fa-inbox"]/following-sibling::text()',MapCompose(lambda i:i.replace('Total Living Room:',''),str.strip,int))
l.add_xpath('Kitchen','//i[@class="fa fa-cutlery"]/following-sibling::text()',MapCompose(lambda i:i.replace('Total kitchen Room:',''),str.strip,int))
l.add_xpath('Bathroom','//i[@class="fa fa-puzzle-piece"]/following-sibling::text()',MapCompose(lambda i:i.replace('Total Toilet/Bathroom:',''),str.strip,int))
l.add_xpath('Address','//b[contains(text(), "Map")]/text()',MapCompose(lambda i:i.replace('Map Loaction :-',''),str.strip))
l.add_xpath('Features','//div[@class="list main-list"]/ul/li/text()',MapCompose(str.strip))
images=response.xpath('//div[@class="carousel-inner dtl-carousel-inner text-center"]/descendant::img').extract()
images=[s.replace('<img src="', '') for s in images]
images=[i.split('?')[0] for i in images]
Image=["http://www.gharbheti.com" + im for im in images]
l.add_value('Images',Image)
return l.load_item()
Cant retrieve next page from the network For another site, this is what I did (simple pagination without javascript
next_page=response.urljoin(response.xpath('//a[contains(text(), "Next")]/@href').extract_first()
yield Request(next_page, callback=self.parse)
Upvotes: 0
Views: 299
Reputation: 303
Because the pagination uses javascript, there is no link in the page's source code.
To see what happening:
The inspector will show you that the site is sending an async POST form request to https://www.gharbheti.com/RoomRentHome/GetPropertiesForRent
, with two values for the form data:
RentTypeId
: 0 {not sure what this is but I'm sure you can figure it out if you need to know}page
: 1 {gets incremented with every click on 'Load More'}You'll have to take a programmatic approach using scrapy's Form Request. Looks like every page yields 10 more properties, so if you want to get the next 1000 after the initial page load you could write
for i in range(1,101):
<send a form request with i as the page value>
I assume that the data format coming back from the POST is not the same as the site homepage, you so may have to define another callback function to parse that data.
Upvotes: 2