Reputation: 49
I am following right now the tutorial in https://codeburst.io/a-guide-to-automating-scraping-the-web-with-javascript-chrome-puppeteer-node-js-b18efb9e9921 to learn more about scraping website using puppeteer. He/she uses the website http://books.toscrape.com/ to this end. The code which we get after following the tutorial is
const puppeteer = require('puppeteer');
let scrape = async () => {
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
await page.goto('http://books.toscrape.com/');
await page.click('#default > div > div > div > div > section > div:nth-child(2) > ol > li:nth-child(1) > article > div.image_container > a > img');
await page.waitFor(1000);
const result = await page.evaluate(() => {
let title = document.querySelector('h1').innerText;
let price = document.querySelector('.price_color').innerText;
return {
title,
price
}
});
browser.close();
return result;
};
scrape().then((value) => {
console.log(value); // Success!
});
The output after running this code is
{ title: 'A Light in the Attic', price: '£51.77' }
I understand all of this but I want to go a little further. Namely, I want to extract the price 51.77 and further use this price to do some calculation with it in the same script. I tried the following but failed
scrape().then((value) => {
const str=value;
const fl=parseFloat(str.substring(42,46));
fl=2*fl;
console.log('result is',fl);
});
I guess I dont fully understand how the innerText function works and what it really outputs.
Upvotes: 0
Views: 711
Reputation: 19
scrape().then((value) => {
const str=value;
let fl=parseFloat(str.substring(42,46));
fl=2*fl;
console.log('result is',fl);
});
value is the result returned from scrape() so value is and object like this
value:{ title: 'A Light in the Attic', price: '£51.77' }
to access the price you have to use '.' you code should be like this :
scrape().then((value) => {
const str=value.price
let fl=parseFloat(str.slice(1));// slice to remove the first character
fl=2*fl;
console.log('result is',fl);
});
Upvotes: 1
Reputation: 20228
Your value
is not a string but an object with a title and a price property. So you can access the price via value.price
.
Alternatively, you can write the argument via destructuring as {title, price}
instead of value
.
Also, you can't declare fl
as a constant if you wish to reassign another value to it later on.
A robust way to remove the currency symbol and possibly other non-numeric symbols from the price is via regex matching:
scrape().then(({title, price}) => {
let fl = +price.match(/\d+.\d+/)[0];
fl = 2 * fl;
console.log('result is', fl);
});
Depending on your needs, you might still want to handle the case when price.match
returns null
in case there is no valid price.
Upvotes: 1
Reputation: 130
I think you should parse the Price Value in this way, and it should work
scrape().then((value) => {
const str = value;
const fl = parseFloat(str.price);
fl=2*fl;
console.log('result is',fl);
});
Upvotes: 1