Vasilis Skentos
Vasilis Skentos

Reputation: 556

iterate through div elements

I'm a complete beginner in javascript and web scraping using puppeteer and I am trying to get the scores of a simple euroleague round in https://www.euroleague.net/main/results?gamenumber=28&phasetypecode=RS&seasoncode=E2019

enter image description here

By inspecting the score list above I find out that the score list is a div element containing other divs inside with the stats displayed .

HTML for a single match between 2 teams (there are more divs for matches below this example )

//score list 
    <div class="wp-module wp-module-asidegames wp-module-5lfarqnjesnirthi">
//the data-code increases to "euro_245" ...
    <div class="">      
        <div class="game played" data-code="euro_244" data-date="1583427600000" data-played="1">
            <a href="/main/results/showgame?gamecode=244&amp;seasoncode=E2019" class="game-link">
                <div class="club">
                    <span class="name">Zenit St Petersburg</span> 
                        <span class="score homepts winner">76</span>
                </div>
                <div class="club">
                    <span class="name">Zalgiris Kaunas</span>  
                        <span class="score awaypts ">75</span>
                </div>
                <div class="info">

                        <span class="date">March 5 18:00 CET</span>
                    <span class="live">
                        LIVE <span class="minute"></span>
                    </span>
                    <span class="final">
                        FINAL
                    </span>
                </div>
            </a>
        </div>
       //more teams 
    </div>
        
</div>

What I want is to iterate through the outer div element and get the teams playing and the score of each match and store them in a json file . However since I am a complete beginner I do not understand how to iterate through the html above . This is my web scraping code to get the element :

const puppeteer = require('puppeteer');


const sleep = (delay) => new Promise((resolve) => setTimeout(resolve,delay));

async function getTeams(url){


const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url);

await sleep(3000);

const games = await page.$x('//*[@id="main-one"]/div/div/div/div[1]/div[1]/div[3]');
//this is where I will execute the iteration part to get the matches with their scores 
await sleep(2000);


await browser.close();
}

getTeams('https://www.euroleague.net/main/results?gamenumber=28&phasetypecode=RS&seasoncode=E2019');

I would appreciate your help with guiding me through the iteration part . Thank you in advance

Upvotes: 1

Views: 726

Answers (1)

theDavidBarton
theDavidBarton

Reputation: 8841

The most accurate selector for a game box is div.game.played (a div which both has the .game and the .played CSS classes), you will need to count the elements that match this criteria. It is possible with page.$$eval (page .$$eval (selector, pageFunction[, ...args])) which runs Array.from(document.querySelectorAll(selector)) within the page and passes it as the first argument to pageFunction.

As we are using the element indexes for the specific data fields we run a regular for loop with the length of the elements.

If you need a specific range of "euro_xyz" you can get the data-code attribute values in a page.evaluate method with Element.getAttribute and check their number against the desired "xyz" number.

To collect each game's data we can define a collector array (gameObj) which can be extended with each iteration. In each iteration we fill an actualGame object with the actual data.

It is important to determine which child elements contain the corresponding data values, e.g.: the home club's name is 'div.game.played > a > div:nth-child(1) > span:nth-child(1)' the div child number selects the club while the span child number decides between the club name and the points. The loop's [i] index is responsible for grabbing the right game box's values (that's why it was counted in the beginning).

For example:

const allGames = await page.$$('div.game.played')
const allGameLength = await page.$$eval('div.game.played', el => el.length)
const gameObj = []
for (let i = 0; i < allGameLength; i++) {
  try {
    let dataCode = await page.evaluate(el => el.getAttribute('data-code'), allGames[i])
    dataCode = parseInt(dataCode.replace('euro_', ''))

    if (dataCode > 243) {
      const actualGame = {
        homeClub: await page.evaluate(el => el.textContent, (await page.$$('div.game.played > a > div:nth-child(1) > span:nth-child(1)'))[i]),
        awayClub: await page.evaluate(el => el.textContent, (await page.$$('div.game.played > a > div:nth-child(2) > span:nth-child(1)'))[i]),
        homePoints: await page.evaluate(el => el.textContent, (await page.$$('div.game.played > a > div:nth-child(1) > span:nth-child(2)'))[i]),
        awayPoints: await page.evaluate(el => el.textContent, (await page.$$('div.game.played > a > div:nth-child(2) > span:nth-child(2)'))[i]),
        gameDate: await page.evaluate(el => el.textContent, (await page.$$('div.game.played > a > div:nth-child(3) > span:nth-child(1)'))[i])
      }
      gameObj.push(actualGame)
    }
  } catch (e) {
    console.error(e)
  }
}

console.log(JSON.stringify(gameObj))

There is a page.waitFor method in puppeteer for the same purpose as your sleep function, but you can also wait for selectors to be appeared (page.waitForSelector).

Upvotes: 1

Related Questions