Tanhaeirad
Tanhaeirad

Reputation: 349

How can I extract value of variable from script element in Scrapy

I need to extract some data from a website, I found that all I need is exist in <script> element, So I extracted them with this command:

script = response.css('[id="server-side-container"] script::text').get()

And this is the value of script:

    window.bottomBlockAreaHtml = '';
    ...
    window.searchQuery = '';
    window.searchResult = {
  "stats": {...},
  "products": {...},
  ...
  };
    window.routedFilter = '';
  ...
    window.searchContent = '';

What is the best way to get the value of "products" in my python code?

Upvotes: 0

Views: 153

Answers (1)

Alexander
Alexander

Reputation: 17355

In your example the best strategy would be to use regex to extract the value of the window.searchResults using regex. Then convert it to a dictionary using json.loads(), and then getting the value from the "products" key of the dictionary.

For example.

import json
import scrapy
import re

class LoplabbetSpider(scrapy.Spider):

    name = "loplabbet"
    start_urls = ["https://www.loplabbet.se/lopning/"]
    pattern = re.compile(r'window\.searchResult = (\{.*?\});', flags=re.DOTALL)

    def parse(self, response):
        for script in response.css("script").getall():
            matches = self.pattern.findall(script)
            if matches:
                results = json.loads(matches[0])
                product = results["products"]
                yield product

Upvotes: 1

Related Questions