Reputation: 39
I can currently scrape Javascript data from a post request I sent using requests then Soup. But I only want to scrape the product plu, sku, description and brand. I am struggling to find a way in which I can just print the data I need rather then the whole script. This is the text that is printed after I extract the script using soup. I will be scraping more than one product from multiple post requests, so the chunk idea is not really suitable.
<script type="text/javascript">
var dataObject = {
platform: 'desktop',
pageType: 'basket',
orderID: '',
pageName: 'Basket',
orderTotal: '92.99',
orderCurrency: 'GBP',
currency: 'GBP',
custEmail: '',
custId: '',
items: [
{
plu: '282013',
sku: '653460',
category: 'Footwear',
description: 'Mayfly Lite Pinnacle Women's',
colour: '',
brand: 'Nike',
unitPrice: '90',
quantity: '1',
totalPrice: '90',
sale: 'false'
} ]
};
As you can see it is far too much information.
Upvotes: 0
Views: 1445
Reputation: 8966
How about this:
Loop over each line to check if the line has a term that you want, and then print out that term:
chunk = '''
<script type="text/javascript">
var dataObject = {
.........blah blah.......
plu: '282013',
sku: '653460',
category: 'Footwear',
description: 'Mayfly Lite Pinnacle Women's',
colour: '',
brand: 'Nike',
..... blah .......
};'''
keys = ['plu', 'sku', 'description', 'brand']
for line in chunk.splitlines():
if line.split(':')[0].strip() in keys:
print line.strip()
Result:
plu: '282013',
sku: '653460',
description: 'Mayfly Lite Pinnacle Women's',
brand: 'Nike',
You could obviously clean up the result using similar applications of split
, strip
, replace
, etc.
Upvotes: 1