Larsson
Larsson

Reputation: 39

Extracting Javascript Variable Object Data in Python and Beautiful Soup Web Scraping

I can currently scrape Javascript data from a post request I sent using requests then Soup. But I only want to scrape the product plu, sku, description and brand. I am struggling to find a way in which I can just print the data I need rather then the whole script. This is the text that is printed after I extract the script using soup. I will be scraping more than one product from multiple post requests, so the chunk idea is not really suitable.

<script type="text/javascript">
var dataObject = {

platform: 'desktop',
pageType: 'basket',
orderID: '',
pageName: 'Basket',
orderTotal: '92.99',
orderCurrency: 'GBP',
currency: 'GBP',
custEmail: '',
custId: '',
items: [

                {


                        plu: '282013',
                        sku: '653460',
                    category: 'Footwear',
                     description: 'Mayfly Lite Pinnacle Women&#039;s',
                     colour: '',
                     brand: 'Nike',
                     unitPrice: '90',
                     quantity: '1',
                     totalPrice: '90',
                     sale: 'false'
                }                                                       ]

};

As you can see it is far too much information.

Upvotes: 0

Views: 1445

Answers (1)

JacobIRR
JacobIRR

Reputation: 8966

How about this:

  1. You assign the captured text to a new multiline string variable called "chunk"
  2. Make a list of keys you are looking for
  3. Loop over each line to check if the line has a term that you want, and then print out that term:

    chunk = '''
    <script type="text/javascript">
    var dataObject = {
    .........blah blah.......
      plu: '282013',
      sku: '653460',
      category: 'Footwear',
      description: 'Mayfly Lite Pinnacle Women&#039;s',
      colour: '',
      brand: 'Nike',
      ..... blah .......
      };'''
    
    keys = ['plu', 'sku', 'description', 'brand']
    
    for line in chunk.splitlines():
      if line.split(':')[0].strip() in keys:
        print line.strip()
    

Result:

plu: '282013',
sku: '653460',
description: 'Mayfly Lite Pinnacle Women&#039;s',
brand: 'Nike',

You could obviously clean up the result using similar applications of split, strip, replace, etc.

Upvotes: 1

Related Questions