nickcoding2
nickcoding2

Reputation: 284

HTML element not selecting in Puppeteer

So I have an HTML excerpt from a webpage as follows:

<li class="PaEvOc tv5olb wbTnP gws-horizon-textlists__li-ed">
  //random div/element stuff inside here
</li>
<li class ="PaEvOc tv5olb gws-horizon-textlists__li-ed">
  //random div/element stuff inside here as well
</li>

Not sure how to properly copy HTML but if you look at "events near location" on Google Chrome, I'm looking at these and trying to scrape the data from them:

https://i.sstatic.net/fv4a4.png

To start, I'm just trying to figure out how to properly select these elements in Puppeteer:

(async () => {
  const browser = await puppeteer.launch({ args: [
  '--no-sandbox'
  ]});
  const page = await browser.newPage();
  page.once('load', () => console.log('Page loaded!'));
  await page.goto('https://www.google.com/search?q=events+near+poughkeepsie+today&client=safari&rls=en&uact=5&ibp=htl;events&rciv=evn&sa=X&fpstate=tldetail');
  console.log('Hit wait for selector')
  const test = await page.waitForSelector(".PaEvOc");
  console.log('finished waiting for selector');
  const seeMoreEventsButton = await page.$(".PaEvOc");

  console.log('seeMoreEventsButton is ' + seeMoreEventsButton);
  console.log('test is ' + test);
})();

What exactly is the problem here? Any and all help much appreciated, thank you!

Upvotes: 1

Views: 847

Answers (1)

olore
olore

Reputation: 4847

I suggest reading this: https://intoli.com/blog/not-possible-to-block-chrome-headless/

Basically, websites are detecting that you are scraping, but you can work around it.

Here is what I did to make your console logs print something useful

const puppeteer = require('puppeteer');

(async () => {                                                    
  const preparePageForTests = async (page) => {
    const userAgent = 'Mozilla/5.0 (X11; Linux x86_64)' +           
      'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.39 Safari/537.36';
    await page.setUserAgent(userAgent);
  }   

  const browser = await puppeteer.launch({ args: [                
  '--no-sandbox'                                                  
  ]});
  const page = await browser.newPage();
  await preparePageForTests(page);
      
  page.once('load', () => console.log('Page loaded!'));           
  await page.goto('https://www.google.com/search?q=events+near+poughkeepsie+today&client=safari&rls=en&uact=5&ibp=htl;events&rciv=evn&sa=X&fpstate=tldetail');                                        
  
  console.log('Hit wait for selector')
  const test = await page.waitForSelector(".PaEvOc");
    
  console.log('finished waiting for selector');                   
  const seeMoreEventsButton = await page.$(".PaEvOc");            
    
  console.log('seeMoreEventsButton is ' + seeMoreEventsButton);   
  console.log('test is ' + test);                                 
})();

Upvotes: 1

Related Questions