Jav Hens
Jav Hens

Reputation: 73

why scrapy can not find xpath that is found in my browser xpath?

Im a newby to scrapy and Im having dificulties extracting the price but not the name using the code below. Any idea what Im doing wrong to get the price? Thank you!

This is the code:

import scrapy
class BfPreciosSpider(scrapy.Spider):
    name = 'BF_precios'
    allowed_domains = ['https://www.boerse-frankfurt.de']
    start_urls = ['https://www.boerse-frankfurt.de/anleihe/xs1186131717-fce-bank-plc-1-134-15-22']
    def  parse(self, response):
                what_name=response.xpath('/html/body/app-root/app-wrapper/div/div[2]/app-bond/div[1]/div/app-widget-datasheet-header/div/div/div/div/div[1]/div/h1/text()').extract_first()
                what_price=response.xpath('/html/body/app-root/app-wrapper/div/div[2]/app-bond/div[2]/div[3]/div[1]/font/text()').extract_first()
                yield{'name': what_name , 'price': what_price}

And these are the items(in red) - name and price: This is the html page where I tak the info from

Upvotes: 0

Views: 79

Answers (1)

msenior_
msenior_

Reputation: 2110

The name information is available directly on the page but the price information is obtained from an api. If you investigate the Network traffic you will find an api call that returns the price information. See below example of how you could obtain this data.

import scrapy
from time import time

class RealtorSpider(scrapy.Spider):
    name = 'BF_precios'
    allowed_domains = ['boerse-frankfurt.de']
    custom_settings = {
        'USER_AGENT': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.80 Safari/537.36'
    }
    start_urls = ['https://www.boerse-frankfurt.de/anleihe/xs1186131717-fce-bank-plc-1-134-15-22']

    def parse(self, response):
        item = {}
        current_time = int(time())
        name = response.xpath('//h1/text()').get()
        isin = response.xpath("//span[contains(text(),'ISIN:')]/text()").re_first(r"ISIN:\s(.*)$")
        mic = response.xpath("//app-widget-index-price-information/@mic").get()
        api_url = f"https://api.boerse-frankfurt.de/v1/tradingview/lightweight/history/single?\
            resolution=D&isKeepResolutionForLatestWeeksIfPossible=false\
            &from={current_time}&to={current_time}&isBidAskPrice=false&symbols={mic}%3A{isin}"

        item['name'] = name
        item['isin'] = isin
        item['mic'] = mic
        yield response.follow(api_url, callback=self.parse_price, cb_kwargs={"item": item})

    def parse_price(self, response, item):
        item['price'] = response.json()[0]['quotes']['timeValuePairs'][0]['value']
        yield item

Running the above spider will yield a dictionary similar to the below

{'name': 'FCE Bank PLC 1,134% 15/22', 'isin': 'XS1186131717', 'mic': 'XFRA', 'price': 99.955}

Upvotes: 1

Related Questions