Reputation: 25
Trying to scrape some data about vulnerabilities using Nodejs and Puppeteer, ran into an issue where some properties are showing as null or empty, but running the SelectorQuery in the browser works ( Version 87.0.4280.88 (x86_64) ). below is a snippet that produces the issues.
For the date that the vulnerability is patched where the selector path is 'div.patched' is where it gives a null pointer error but for the rest of the values, it works perfect so I am not sure what is the issue because I am following the same logic
const puppeteer = require('puppeteer');
const url = 'https://www.zero-day.cz/database/';
const selector = '.issue.col-md-6';
(async function(){
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url);
const articles = await page.$$eval(selector, nodes => {
return nodes.map( node => {
let title = node.querySelector('h3.issue-title').textContent.trim();
title = title.replace(/[\n\r]+|[\s]{2,}/g, ", ");
let titleDescription = node.querySelector('p.desc-title').textContent.trim();
let description = node.querySelector('div.description.for-l').textContent.trim();
description = description.replace(/[\n\r]+|[\s]{2,}/g, " ");
let timeDiscovered = node.querySelector('div.discavered').textContent.trim();
let timePatched = node.querySelector('.patched').textContent;
{};
return {
title,
titleDescription,
description,
timeDiscovered,
timePatched
}
})
});
//Write to the console
console.log(articles);
await browser.close();
})();
Error message:
Error: Evaluation failed: TypeError: Cannot read property 'textContent' of null
Upvotes: 1
Views: 5873
Reputation: 13782
Sections with not-patched issues have no elements with 'div.patched'
selector, they have 'div.not-patched'
instead. You can chek if node.querySelector('.patched')
is null and then return any value for timePatched
or use node.querySelector('.not-patched').textContent
.
For example, replace:
let timePatched = node.querySelector('.patched').textContent;
with:
let timePatched = node.querySelector('.patched')?.textContent || "Not patched.";
to return timePatched
as "Not patched."
string for not patched issues.
Upvotes: 3