Reputation: 31
I've been trying to get puppeteer to launch a unique instance for every profile stored in a .json file. This is because currently I am stuck creating a new folder with all my code and a unique .json file for every account/instance I want to run. I'd prefer if I could just store all my info in 1 .json file and then have my code launch a unique instance for each profile.
Goal:
Example: Puppeter instance 1 launch with profile 1, puppeteer instance 2 launch with profile 2, etc.
Example of settings.json
[
{
"email": "[email protected]"
},
{
"email": "[email protected]"
},
{
"email": "[email protected]"
}
]
Example of main.js
const fs = require('fs');
const puppeteer = require('puppeteer');
const profile = JSON.parse(fs.readFileSync('./settings.json'));
var id = 0
while (id <= 2) {
emailInfo = profile[id].email;
console.log(emailInfo)
botRun()
id++;
}
function botRun() {
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.waitForTimeout(500)
console.log('function ' + emailInfo) //pretend this is page.type --> it would result in '[email protected]' for all instances since this is what the var is now but I want it to stay with the info in the loop
await browser.close();
})();
}
Obviously this is horrendously wrong since emailInfo var will update therefore resulting in puppeteer applying the latest value. Is there any way I can make each puppeteer instance stick with the unique data?
Edit 1:
Managed to get the workaround but now I seem to have ran into a new issue. Basically, in one point of my script I tell the browser to close the tab and reopen a new one. It closes each tab in each individual browser fine but when I use "await browser.newPage();" it sends all the new tabs to just 1 browser instead of staying in their respective browser.
const puppeteer = require('puppeteer-extra');
const fs = require('fs');
const botRun = async emailInfo => {
browser = await puppeteer.launch({
args: [],
headless: false,
ignoreHTTPSErrors: true,
slowMo: 5,
});
const page = await browser.newPage();
await page.waitForTimeout(2500)
// do stuff with emailInfo
await page.close(); // works fine - will close tab for each browser
await browser.newPage(); // suddenly sends all tabs to 1 browser
};
(async () => {
const profile = JSON.parse(fs.readFileSync("./settings.json"));
await Promise.all(profile.map(({email}) => botRun(email)));
})();
Here is an image for clarification. My goal is to keep the tabs in their respective browser rather than suddenly all being thrown to 1 browser:
Upvotes: 1
Views: 4032
Reputation: 56885
Put the loop into the Puppeteer code or pass the emailInfo
as a parameter to the function.
If you want to run tasks in succession:
const fs = require("fs");
const puppeteer = require("puppeteer");
(async () => {
const profile = JSON.parse(fs.readFileSync("./settings.json"));
const browser = await puppeteer.launch();
for (const {email: emailInfo} of profile) {
const page = await browser.newPage();
await page.waitForTimeout(500)
// do stuff with emailInfo
await page.close();
}
await browser.close();
})();
If you want to run all tasks in parallel:
(async () => {
const profile = JSON.parse(fs.readFileSync("./settings.json"));
const browser = await puppeteer.launch();
await Promise.all(profile.map(async ({email: emailInfo}) => {
const page = await browser.newPage();
await page.waitForTimeout(500)
// do stuff with emailInfo
await page.close();
}));
await browser.close();
})();
If // do stuff with emailInfo
is a very long chunk of code, use a function (as you're attempting originally) and give it emailInfo
as a parameter. This most closely matches what you were originally going for (open a new browser per email):
const botRun = async emailInfo => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.waitForTimeout(500)
// do stuff with emailInfo
await browser.close();
};
(async () => {
const profile = JSON.parse(fs.readFileSync("./settings.json"));
for (const {email} of profile) {
await botRun(email); // one at a time
}
})();
Or run all emails at once:
// botRun is the same as above
(async () => {
const profile = JSON.parse(fs.readFileSync("./settings.json"));
await Promise.all(profile.map(({email}) => botRun(email)));
})();
The semantics of the first two snippets are a bit different than your code, but I doubt it makes sense to generate and destroy a whole browser process for every request. Prefer opening a new page (tab) in the current browser unless you have a good reason to do otherwise.
Also, neither pattern is great if you have large inputs--the sequential approach is likely too slow, the parallel approach is likely too fast (opening 4000 browsers at once isn't fun). Consider a task queue for such cases so you can do some parallel work but keep it bound to a sensible degree. puppeteer-cluster
has such a task queue.
jfriend00 has ticked most of the critical points beyond this (avoid globals, avoid var
, etc) but I'd also like to add that you almost never need loops with counter variables. If you do use a loop with counter, prefer for
loops to while
. Loops with counters are verbose and tend to lead to bugs associated with off-by-one errors. JS offers many iteration abstractions like map
, forEach
and for..of
loops that are clean, semantic and less error-prone.
Also, the above code omits error handling, but try
-catch
is pretty much essential when calling Puppeteer functions that can time out. You don't want to crash your app ungracefully if an operation takes a little longer than you expect or a server is down. Use a finally
block to ensure you call browser.close()
.
Finally, page.waitForTimeout
is deprecated and will be removed in future Puppeteer versions. There are better ways to delay your script until a condition is met. See puppeteer: wait N seconds before continuing to the next line for further discussion.
See also Crawling multiple URLs in a loop using Puppeteer.
Upvotes: 3
Reputation: 707238
In a nutshell, don't let asynchronous operations running in parallel use higher scoped variables that are "shared". That is the crux of your problem as you have a loop of asynchronous operations attempting to all use the emailInfo
variable so they will stomp on each other.
Don't make emailInfo
be a higher scoped variable like you are (actually, even worse, you weren't declaring it at all which made it an implicit global - very bad). Pass it as a function argument into the specific functions you want to use it in or declare it with let
within the scope you want to use it in. Then, it will have separate values in each place it is being used. Your problem is that you have one variable and a number of asynchronous things all trying to use it. That will always cause a problem in Javascript.
Also, don't use var
any more. Use let
or const
. Both of those are blocked-scoped rather than function scoped so you can more finely control what their scope is. You can always declare a variable with let
at the top of a function if you really want a function scoped variable.
If the real problem you're trying to solve is that you want to use emailInfo
inside of botRun()
, then just pass in that value:
const fs = require('fs');
const puppeteer = require('puppeteer');
const profile = JSON.parse(fs.readFileSync('./settings.json'));
let id = 0;
while (id <= 2) {
console.log(profile[id].email);
botRun(profile[id].email);
id++;
}
async function botRun(emailInfo) {
let browser;
try {
browser = await puppeteer.launch();
const page = await browser.newPage();
await page.waitForTimeout(500);
console.log('function ' + emailInfo);
} catch(e) {
console.log(e);
// decide what you're doing upon errors here
} finally {
if (browser) {
await browser.close();
}
}
}
Also, no need for the extra function inside of botRun()
. You can just make botRun()
be async
and that will work fine. And, you need some proper error handling if any of the await
statements encounters a rejected promise.
Upvotes: 1