Reputation: 103
I want to scrape an image from wikipedia page but the problem is i am getting 3 urls of the same image at a time and those three urls are in the same tag called img .I just want src url. Anybody knows how to do it.
const puppeteer = require('puppeteer');
const sleep = require('sleep');
(async ()=> {
const browser = await puppeteer.launch({
"headless": false
});
const page =await browser.newPage();
await page.goto("https://www.wikipedia.org/");
const xpathselector = `//span[contains(text(), "Commons")]`;
const commonlinks = await page.waitForXPath(xpathselector);
await page.waitFor(3000);
await commonlinks.click();
await page.waitFor(2000)
//await page.waitForSelector()
const images = await page.$eval(('a[class="image"] > img[src]'),node => node.innerHTML);
console.log(images);
} ) ();
//*[@id="mainpage-potd"]/div[1]/a/img
Upvotes: 2
Views: 3534
Reputation: 21695
I bet that you "see" three URLs because you are looking at the srcset
, which has many URLs for different screens. resolutions. You could return the src
property instead:
const images = await page.$eval(('a[class="image"] > img[src]'),node => node.src);
Upvotes: 4