Object of type Selector is not JSON serializable

Question

I'm trying to scrape a dynamic website and I need Selenium.

The links that I want to scrape only open if I click on that specific element. They are being opened by jQuery, so my only option is to click on them because there is no href attribute or anything that would give me an URL.

My approach is this one:

# -*- coding: utf-8 -*-
import scrapy

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from scrapy.selector import Selector
from scrapy_selenium import SeleniumRequest

class AnofmSpider(scrapy.Spider):
    name = 'anofm'
    
    def start_requests(self):
        yield SeleniumRequest(
            url='https://www.anofm.ro/lmvw.html?agentie=Covasna&categ=3&subcateg=1',
            callback=self.parse
        )

    def parse(self, response):  
        driver = response.meta['driver'] 
        try:
            element = WebDriverWait(driver, 10).until(
                EC.presence_of_element_located((By.ID, "tableRepeat2"))
            )
        finally:
            html = driver.page_source
            response_obj = Selector(text=html)
            
            links = response_obj.xpath("//tbody[@id='tableRepeat2']")
            for link in links:
                driver.execute_script("arguments[0].click();", link)
                
                yield {
                    'Ocupatia': response_obj.xpath("//div[@id='print']/p/text()[1]")
                }

but it won't work.

On the line where I want to click on that element, I get this error:

TypeError: Object of type Selector is not JSON serializable

I kind of understand this error, but I have no idea how to solve it. I somehow need to transform that object from a Selector to a Clickable button.

I checked online for solutions and also the docs, but I couldn't find anything useful.

Can anybody help me better understand this error and how should I fix it?

Thanks.

Md. Fazlul Hoque · Accepted Answer

Actually, data is also generating from API calls JSON response and you can easily scrape from API. Here is the working solution along with pagination. Each page contains 8 items where total items 32.

CODE:

import scrapy
import json

class AnofmSpider(scrapy.Spider):

    name = 'anofm'

    def start_requests(self):
        yield scrapy.Request(
            url='https://www.anofm.ro/dmxConnect/api/oferte_bos/oferte_bos_query2L_Test.php?offset=8&cauta=&select=Covasna&limit=8&localitate=',
            method='GET',
            callback=self.parse,
            meta= {
                'limit': 8}
                )


    def parse(self, response):
        resp = json.loads(response.body)
        hits = resp.get('lmv').get('data')
        for h in hits:
            yield {
                'Ocupatia': h.get('OCCUPATION')
            }


        total_limit = resp.get('lmv').get('total')
        next_limit = response.meta['limit'] + 8
        if next_limit <= total_limit:
            yield scrapy.Request(
                url=f'https://www.anofm.ro/dmxConnect/api/oferte_bos/oferte_bos_query2L_Test.php?offset=8&cauta=&select=Covasna&limit={next_limit}&localitate=',
                method='GET',
                callback=self.parse,
                meta= {
                    'limit': next_limit}
                    )

Object of type Selector is not JSON serializable

Answers (2)

Related Questions