Reputation: 219
I am making a script in javascript to scrape product information from a particular website. The way it works it first I get all product links from the site and then scrape info about the products as per their links. I'm having problems while scraping product links from the site. Normaly when you open the site you have keep scrolling down untill all products are loaded on the page. for that I'm using the following code:
await page.evaluate(async () => {
await new Promise((resolve, reject) => {
let totalHeight = 0;
const distance = 100;
const timer = setInterval(() => {
const scrollHeight = document.body.scrollHeight;
window.scrollBy(0, distance);
totalHeight += distance;
if (totalHeight >= scrollHeight) {
clearInterval(timer);
resolve();
}
}, 300);
});
});
Now this works like a charm on my local computer having Windows 10. I need to run it on a linux server on digital ocean droplet. When I upload the code to my linux server, it seems like the products won't load when the script scrolls the page down. We get only the products which are on the first page only. I tried it on google cloud compute engine also. But no luck..
Have any of guys faced similar issues... If yes, can you please me help me out..
Upvotes: 0
Views: 1536
Reputation: 195
*Disclaimer: my server is Ubuntu so I hope this helps, but it might be slightly different.
ldd chrome | grep not
and you might find this useful for installing them all at once: sudo apt-get install gconf-service libasound2 libatk1.0-0 libatk-bridge2.0-0 libc6 libcairo2 libcups2 libdbus-1-3 libexpat1 libfontconfig1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk-3-0 libnspr4 libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 ca-certificates fonts-liberation libappindicator1 libnss3 lsb-release xdg-utils wget
Again, yours may be slightly different since you're running a different system, but definitely be sure your dependencies are installed with whatever the equivalent procedure for your system is.
If that's all good, you will want to check the portion of your code that looks like this:
await page.goto('https://google.com', {waitUntil: 'networkidle2'});
and be sure that the waitUntil value is waiting for networkidle2. Check the docs on this value.
You can also run puppeteer in slowMo to get a better idea of what's happening. This option will delay the operations by a specified number of milliseconds.
const browser = await puppeteer.launch({
headless: false,
slowMo: 250 // slow down by 250ms
}); ```
Upvotes: 1