how to scrape data from multiple pages in the same csv row?

Question

I need to scrape data from multiple pages. First it should scrape data from the first page then from this page extract a url to the second page and get some data from it, too

All should be on the same csv row.

This is the first page: https://www.catalogs.ssg.asia/toyota/?fromchanged=true&lang=en&l=bWFya2V0PT1nZW5lcmFsfHxzdD09MjB8fHN0cz09eyIxMCI6IlJlZ2lvbiIsIjIwIjoiTWlkZGxlIEVhc3QifQ%3D%3D

example of the data is the first row on the table e.g:catalog, model, production, and series.

This is the second page: https://www.catalogs.ssg.asia/toyota/?fromchanged=true&lang=en&l=bWFya2V0PT1nZW5lcmFsfHxzdD09MzB8fHN0cz09eyIxMCI6IlJlZ2lvbiIsIjIwIjoiTWlkZGxlIEVhc3QiLCIzMCI6IjRSVU5ORVIgNjcxMzYwIn18fGNhdGFsb2c9PTY3MTM2MHx8cmVjPT1CMw%3D%3D example of the data: series, engine, production date.

both should be together on the same csv row like the screenshot:

This is my code:

import datetime
import urlparse
import socket
import scrapy

from scrapy.loader.processors import MapCompose, Join
from scrapy.loader import ItemLoader
from scrapy.http import Request

from properties.items import PropertiesItem


class BasicSpider(scrapy.Spider):
    name = "manual"


    # This is the page which i will hit middle est from.
    start_urls = ["https://www.catalogs.ssg.asia/toyota/?fromchanged=true&lang=en"]


    def parse(self, response):
        # First page
        next_selector ="https://www.catalogs.ssg.asia/toyota/?fromchanged=true&lang=en&l="+response.xpath('//*[@id="rows"]/tr[2]/@onclick').re(r"HM\.set\('([^']+)'")[0]
        yield Request(next_selector, callback=self.parse_item)

    def parse_item(self, response):
        for tr in response.xpath("/html/body/table[2]/tr/td/table/tr")[1:]:
            item = PropertiesItem()

            item['Series']= tr.xpath("td[1]/text()").extract()
            item['Engine']= tr.xpath("td[2]/text()").extract()
            second_selector ="https://www.catalogs.ssg.asia/toyota/?fromchanged=true&lang=en&l="+response.xpath('/html/body/table[2]/tr/td/table/tr/@onclick').re(r"HM\.set\('([^']+)'")

            yield item

    def parse_item_2(self, response):
        item = PropertiesItem()
        item['Building_Condition']=response.xpath('/html/body/table[2]/tr/td/table/tr[2]/td[1]/text()').extract()
        yield item

I need to write some code in parse item to go to parse_item_2 and handle the second page and get the results to be on the same csv row. How to do that?

how to scrape data from multiple pages in the same csv row?

Answers (1)

Related Questions