Reputation: 1
I have been asked to maintain a daily report where I keep a track of the number of good, poor and average urls every alernative day. The way that this works is usually that today, we would get the data for the previous day and so on. However, recently, I was asked to automate the process of data extraction using apis or selenium. I was wondering if anyone is aware of how to extract just the numbers of the good, poor and needs improvement urls everyday?
So far, I have gone through the documentation for the GSC api and from my understanding and couldn't find any endpoint for data extraction from the core web vitals. I have also gone through the documentation of the Crux api and there also, from what I understood, I would not be able to get the numbers of the good, bad and average urls. Right now, I have resorted to using puppeteer to scrape the required data off the GSC , but even there I am stuck in the blocker because I have been able to log in, open the GSC, then select the property, go to the core web vitals. However, after that I am not able to get the text content for the good, bad and poor urls through any selector, even though the xpath exists. Can I please get some help to understand if there is something wrong with the code that I have written or is this a google search console security measure?
Below is the element and I want to fecth the "0 poor URLs" text and the code that I have written
<div class="itpf1b"><div class="FZD9xd lA8qmc"></div><div class="qL2dyd dzkZrb">0 poor URLs</div></div>
const fetchUrlMetrics = async (page) => {
try {
const poorUrlsSelector =
"body > div:nth-child(9) > c-wiz:nth-child(6) > div:nth-child(1) > div:nth-child(2) > div:nth-child(2) > div:nth-child(1) > c-wiz:nth-child(1) > div:nth-child(1) > c-wiz:nth-child(2) > div:nth-child(1) > div:nth-child(1) > div:nth-child(1) > div:nth-child(1) > div:nth-child(2)";
console.log("Waiting for poor URLs selector...");
await page.waitForSelector(poorUrlsSelector, { visible: true });
console.log("Poor URLs selector is visible.");
const poorUrlsText = await page.evaluate((selector) => {
const element = document.querySelector(selector);
return element ? element.innerText : null;
}, poorUrlsSelector);
console.log("Fetched poor URLs text:", poorUrlsText);
if (poorUrlsText) {
const poorCount = parseInt(poorUrlsText.replace(/\D/g, ''), 10);
console.log(`Poor URLs: ${poorCount}`);
}
else {
console.log("No poor URLs found.");
}
} catch (error) {
console.error("Error fetching URL metrics:", error);
}
};
Upvotes: 0
Views: 56
Reputation: 3409
If you are going to those lengths to scrape highly aggregated data (CrUX) from an admin tool, why don't you instead gather detailed data directly from your website:
https://github.com/GoogleChrome/web-vitals
Upvotes: 0