Reputation: 143
I try to scrap an element on a website and display it on localhost with Puppeteer (1). But when this element changes, I would like to refresh data without opening a new browser/page with Puppeteer and only when element changes (2).
For my example, I use www.timeanddate.com and the element is time (hours and minutes). For moment, only first part works. I don't have solution for second one.
Please find below, my code.
app.js
var app = require('express')();
var server = require('http').createServer(app);
var io = require('socket.io').listen(server);
var puppeteer = require('puppeteer');
app.get('/', function(req, res) {
res.render('main.ejs');
});
server.listen(8080);
let scrape = async () => {
var browser = await puppeteer.launch({headless: true});
var page = await browser.newPage();
await page.goto('https://www.timeanddate.com/worldclock/personal.html');
await page.waitFor(300);
//await page.click('#mpo > div > div > div > div.modal-body > div.form-submit-row > button.submit.round.modal-privacy__btn');
var result = await page.evaluate(() => {
return document.getElementsByClassName('c-city__hrMin')[0].innerText;
});
return result;
};
io.sockets.on('connection', function (socket) {
scrape().then((value) => { // it tooks time, a few seconds while page is loading.
console.log(value);
socket.emit('refresh', value);
});
});
main.ejs
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<title>What time is it?</title>
<style>
a {text-decoration: none; color: black;}
</style>
</head>
<body>
<h1>Welcome !</h1>
<div id="time">loading</div>
<script src="http://code.jquery.com/jquery-1.10.1.min.js"></script>
<script src="/socket.io/socket.io.js"></script>
<script>
var socket = io.connect('http://localhost:8080');
socket.on('refresh', function (value) {
$('#time').html(value);
});
</script>
</body>
</html>
I try Fiverr but awful experience. I hope it will better here :)
Thank you for helping me.
Upvotes: 2
Views: 4989
Reputation: 18866
You want to emit event when the data changes. There are multiple ways to do that, such as,
I will discuss both of them. But first, lets split the code for a better usability. It's completely optional but you should do it.
/**
* Scraper
* Use this instead of scrape variable
*/
let browser, page;
const scraper = {
async open() {
browser = await puppeteer.launch({ headless: true });
page = await browser.newPage();
const url = "https://www.timeanddate.com/worldclock/personal.html";
await page.goto(url);
await page.waitFor(300);
},
async getTime() {
return page.evaluate(() => {
return document.querySelector(".c-city__digitalClock").innerText; // time with seconds 5:43:22am
});
}
};
We can add other methods to this object later if we need. This is not the best format, but this will help us understand the code better at this point.
Let's modify the connection, we just need to open the page once and poll new data on some interval.
/**
* Socket Connection Monitor
*/
io.sockets.on("connection", async function(socket) {
// open the page once
await scraper.open();
// start the interval loop
setInterval(async () => {
// get the time every second
const time = await scraper.getTime();
// emit the updated time
socket.emit("refresh", time);
}, 1000); // how many millisecond we want
});
This is advanced and much more complex, however very accurate.
You can add this inside scraper
object.
// <-- Pass the socket so it can use it
async runEvents(socket) {
// Create a Shadow event tracker on puppeteer
await page.exposeFunction("emitter", (...data) => {
socket.emit(...data)
});
await page.evaluate(function observeDom() {
// expose the observer which will watch
//More Details https://developer.mozilla.org/en-US/docs/Web/API/MutationObserver
// select the target node
var target = document.querySelector(".c-city__digitalClock");
// create an observer instance
var observer = new MutationObserver(function(mutations) {
// Do something on change
emitter("refresh", target.innerText); // <-- trigger the event whenever there is a change
});
// configuration of the observer:
var config = { childList: true, subtree: true };
// pass in the target node, as well as the observer options
observer.observe(target, config);
});
}
And then your connection will look like,
io.sockets.on("connection", async function(socket) {
await scraper.open();
await scraper.runEvents(socket); // <-- Pass the socket
});
How it works,
socket.emit
with whatever data it getspage
.Here is a visual difference between these two:
(I used 500ms interval and it's 60 frames per second, so the animation is not catching everything, but it's there, link to repo.)
The difference between setInterval and the event is, setInterval will check after certain amount of time, while the observer will continuously observe the changes and trigger whenever there is a change.
Which to choose:
setInterval
version.observer
version.Upvotes: 5