ColeWorld
ColeWorld

Reputation: 279

Scraping element <script> for strings in Python

Currently trying to check the stock of a size small on this PAGE (which is 0) but specifically retrieve the inventory of a size small from this data:

<script>
(function($) { 
  var variantImages = {},
    thumbnails,
    variant,
    variantImage;





       variant = {"id":18116649221,"title":"XS","option1":"XS","option2":null,"option3":null,"sku":"BGT16073100","requires_shipping":true,"taxable":true,"featured_image":null,"available":true,"name":"Iron Lords T-Shirt - XS","public_title":"XS","options":["XS"],"price":2499,"weight":136,"compare_at_price":null,"inventory_quantity":16,"inventory_management":"shopify","inventory_policy":"deny","barcode":""};
       if ( typeof variant.featured_image !== 'undefined' && variant.featured_image !== null ) {
         variantImage =  variant.featured_image.src.split('?')[0].replace('http:','');
         variantImages[variantImage] = variantImages[variantImage] || {};



           if (typeof variantImages[variantImage]["option-0"] === 'undefined') {
             variantImages[variantImage]["option-0"] = "XS";
           }
           else {
             var oldValue = variantImages[variantImage]["option-0"];
             if ( oldValue !== null && oldValue !== "XS" )  {
               variantImages[variantImage]["option-0"] = null;
             }
           }

       }










       variant = {"id":18116649285,"title":"Small","option1":"Small","option2":null,"option3":null,"sku":"BGT16073110","requires_shipping":true,"taxable":true,"featured_image":null,"available":false,"name":"Iron Lords T-Shirt - Small","public_title":"Small","options":["Small"],"price":2499,"weight":159,"compare_at_price":null,"inventory_quantity":0,"inventory_management":"shopify","inventory_policy":"deny","barcode":""};
       if ( typeof variant.featured_image !== 'undefined' && variant.featured_image !== null ) {
         variantImage =  variant.featured_image.src.split('?')[0].replace('http:','');
         variantImages[variantImage] = variantImages[variantImage] || {};



           if (typeof variantImages[variantImage]["option-0"] === 'undefined') {
             variantImages[variantImage]["option-0"] = "Small";
           }
           else {
             var oldValue = variantImages[variantImage]["option-0"];
             if ( oldValue !== null && oldValue !== "Small" )  {
               variantImages[variantImage]["option-0"] = null;
             }
           }

       }

How can I tell python to locate the <script> tag and then the specific "inventory_quantity":0 to return the stock of the product for a size Small?

Upvotes: 1

Views: 1261

Answers (3)

Juraj Bezručka
Juraj Bezručka

Reputation: 502

you can find it using regex:

s = 'some sample text in which "inventory_quantity":0 appears'
occurences = re.findall('"inventory_quantity":(\d+)', s)
print(occurences[0])
'0'

edit: I suppose you can get the whole content of <script>...</script> in a variable t (either lxml, xml.etree, beautifulsoup or simply re).

before we start, let's define some variables:

true = True
null = None

then using regex find a dictionary as text and convert to dict via eval

r = re.findall('variant = (\{.*}?);', t)

if r:
    variant = eval(r)

This is what you get:

>>> variant
{'available': True,
 'barcode': '',
 'compare_at_price': None,
 'featured_image': None,
 'id': 18116649221,
 'inventory_management': 'shopify',
 'inventory_policy': 'deny',
 'inventory_quantity': 16,
 'name': 'Iron Lords T-Shirt - XS',
 'option1': 'XS',
 'option2': None,
 'option3': None,
 'options': ['XS'],
 'price': 2499,
 'public_title': 'XS',
 'requires_shipping': True,
 'sku': 'BGT16073100',
 'taxable': True,
 'title': 'XS',
 'weight': 136}

Now you can easily get any information you need.

Upvotes: 2

alecxe
alecxe

Reputation: 474021

Both the current answers don't address the problem of locating the inventory_quantity by the desired size which is not straightforward at the first glance.

The idea is to not dive into string parsing too much, but extract the complete sca_product_info JS array into the Python list via json.loads(), then filter the list by the desired size. Of course, we should first locate the desired JS object - for this we'll use a regular expression - remember, this is not HTML parsing at this point and doing that with a regular expression is pretty much okay - this famous answer does not apply in this case.

Complete implementation:

import json
import re

import requests


DESIRED_SIZE = "XS"

pattern = re.compile(r"freegifts_product_json\s*\((.*?)\);", re.MULTILINE | re.DOTALL)

url = "http://bungiestore.com/collections/featured/products/iron-lords-t-shirt-men"
response = requests.get(url)

match = pattern.search(response.text)

# load the extracted string representing the "sca_product_info" JS array into a Python list
product_info = json.loads(match.group(1))

# look up the desired size in a list of product variants
for variant in product_info["variants"]:
    if variant["title"] == DESIRED_SIZE:
        print(variant["inventory_quantity"])
        break

Prints 16 at the moment.

By the way, we could have also used a JavaScript parser, like slimit - here is a sample working solution:

Upvotes: 1

Glenn
Glenn

Reputation: 5071

Assuming you can get the block of code into a string format, and assuming the format of the code doesn't change too much, you could do something like this:

before = ('"inventory_quantity":')
after = (',"inventory_management"')

start = mystr.index(before) + len(before)
end = mystr.index(after)

print(mystr[start:end])

Upvotes: 0

Related Questions