why scrapy can not find xpath that is found in my browser xpath?

Question

Im a newby to scrapy and Im having dificulties extracting the price but not the name using the code below. Any idea what Im doing wrong to get the price? Thank you!

This is the code:

import scrapy
class BfPreciosSpider(scrapy.Spider):
    name = 'BF_precios'
    allowed_domains = ['https://www.boerse-frankfurt.de']
    start_urls = ['https://www.boerse-frankfurt.de/anleihe/xs1186131717-fce-bank-plc-1-134-15-22']
    def  parse(self, response):
                what_name=response.xpath('/html/body/app-root/app-wrapper/div/div[2]/app-bond/div[1]/div/app-widget-datasheet-header/div/div/div/div/div[1]/div/h1/text()').extract_first()
                what_price=response.xpath('/html/body/app-root/app-wrapper/div/div[2]/app-bond/div[2]/div[3]/div[1]/font/text()').extract_first()
                yield{'name': what_name , 'price': what_price}

And these are the items(in red) - name and price:

msenior_ · Accepted Answer

The name information is available directly on the page but the price information is obtained from an api. If you investigate the Network traffic you will find an api call that returns the price information. See below example of how you could obtain this data.

import scrapy
from time import time

class RealtorSpider(scrapy.Spider):
    name = 'BF_precios'
    allowed_domains = ['boerse-frankfurt.de']
    custom_settings = {
        'USER_AGENT': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.80 Safari/537.36'
    }
    start_urls = ['https://www.boerse-frankfurt.de/anleihe/xs1186131717-fce-bank-plc-1-134-15-22']

    def parse(self, response):
        item = {}
        current_time = int(time())
        name = response.xpath('//h1/text()').get()
        isin = response.xpath("//span[contains(text(),'ISIN:')]/text()").re_first(r"ISIN:\s(.*)$")
        mic = response.xpath("//app-widget-index-price-information/@mic").get()
        api_url = f"https://api.boerse-frankfurt.de/v1/tradingview/lightweight/history/single?\
            resolution=D&isKeepResolutionForLatestWeeksIfPossible=false\
            &from={current_time}&to={current_time}&isBidAskPrice=false&symbols={mic}%3A{isin}"

        item['name'] = name
        item['isin'] = isin
        item['mic'] = mic
        yield response.follow(api_url, callback=self.parse_price, cb_kwargs={"item": item})

    def parse_price(self, response, item):
        item['price'] = response.json()[0]['quotes']['timeValuePairs'][0]['value']
        yield item

Running the above spider will yield a dictionary similar to the below

{'name': 'FCE Bank PLC 1,134% 15/22', 'isin': 'XS1186131717', 'mic': 'XFRA', 'price': 99.955}

why scrapy can not find xpath that is found in my browser xpath?

Answers (1)

Related Questions