Reputation: 45
I have a working puppeteer script that I'd like to make into an API but I'm having problems with waitForSelector.
Background:
I wrote a puppeteer script that successfully searches for and scrapes the result of a query I specify in the code e.g. let address = xyz;
. Now I'd like to make it into an API so that a user can query something. I managed to code everything necessary for the local API (working with express) and everything works as well. By that I mean: I coded all the server side stuff: I can make a request, the scraper function is called, puppeteer starts up, carries out my search (I need to type in an address, choose from a dropdown and press enter).
The status: The result of my query is a form (basically 3 columns and some rows) in an iFrame and I want to scrape all the rows (I modify them into a specific json later on). The way it works is I use waitForSelector on the form's selector and then I use frame.evaluate.
Problem: When I run my normal scraper everything works well, but when I run the (slightly modified but essentially same) code within the API framework, waitForSelector suddenly always times out. I have tried all the usual workarounds: waitForNavigation, taking a screenshot and inspecting etc but nothing helped. I've been reading quite a bit and could it be that I'm screwing something up in terms of async/await when I call my scraper from within the context of the API? I'm still quite new to this so please bear with me. This is the code of the working script - I indicated the important part
const puppeteer = require("puppeteer");
const chalk = require("chalk");
const fs = require('fs');
const error = chalk.bold.red;
const success = chalk.keyword("green");
address = 'Gumpendorfer Straße 12, 1060 Wien';
(async () => {
try {
// open the headless browser
var browser = await puppeteer.launch();
// open a new page
var page = await browser.newPage();
// enter url in page
await page.goto(`https://mein.wien.gv.at/Meine-Amtswege/richtwert?subpage=/lagezuschlag/`, {waitUntil: 'networkidle2'});
// continue without newsletter
await page.click('#dss-modal-firstvisit-form > button.btn.btn-block.btn-light');
// let everyhting load
await page.waitFor(1000)
console.log('waiting for iframe with form to be ready.');
//wait until selector is available
await page.waitForSelector('iframe');
console.log('iframe is ready. Loading iframe content');
//choose the relevant iframe
const elementHandle = await page.$(
'iframe[src="/richtwertfrontend/lagezuschlag/"]',
);
//go into frame in order to input info
const frame = await elementHandle.contentFrame();
//enter address
console.log('filling form in iframe');
await frame.type('#input_adresse', address, { delay: 100});
//choose first option from dropdown
console.log('Choosing from dropdown');
await frame.click('#react-autowhatever-1--item-0');
console.log('pressing button');
//press button to search
await frame.click('#next-button');
// scraping data
console.log('scraping')
await frame.waitForSelector('#summary > div > div > br ~ div');//This keeps failing in the API
const res = await frame.evaluate(() => {
const rows = [...document.querySelectorAll('#summary > div > div > br ~ div')];
const cells = rows.map(
row => [...row.querySelectorAll('div')]
.map(cell => cell.innerText)
);
return cells;
});
await browser.close();
console.log(success("Browser Closed"));
const mapFields = (arr1, arr2) => {
const mappedArray = arr2.map((el) => {
const mappedArrayEl = {};
el.forEach((value, i) => {
if (arr1.length < (i+1)) return;
mappedArrayEl[arr1[i]] = value;
});
return mappedArrayEl;
});
return mappedArray;
}
const Arr1 = res[0];
const Arr2 = res.slice(1,3);
let dataObj = {};
dataObj[address] = [];
// dataObj['lagezuschlag'] = mapFields(Arr1, Arr2);
// dataObj['adresse'] = address;
dataObj[address] = mapFields(Arr1, Arr2);
console.log(dataObj);
} catch (err) {
// Catch and display errors
console.log(error(err));
await browser.close();
console.log(error("Browser Closed"));
}
})();
I just can't understand why it would work in the one case and not in the other, even though I barely changed something. For the API I basically changed the name of the async function to const search = async (address) => {
such that I can call it with the query in my server side script.
Thanks in advance - I'm not attaching the API code cause I don't want to clutter the question. I can update it if it's necessary
Upvotes: 1
Views: 3509
Reputation: 45
I solved this myself. Turns out the problem wasn't as complicated as I thought and it was annoyingly simple to solve. The problem wasn't with the selector that was timing out but with the previous selectors, specifically the typing and choosing from dropdown selectors. Essentially, things were going too fast. Before the search query was typed in, the dropdown was already pressed and nonsense came out. How I solved it: I included a waitFor(1000) call before the dropdown is selected and everything went perfectly. An interesting realisation was that even though that one selector timed out, it wasn't actually the source of the problem. But like I said, annoyingly simple and I feel dumb for asking this :) but maybe someone will see this and learn from my mistake
Upvotes: 3