CiscoKidx
CiscoKidx

Reputation: 920

Scrape html with request/cheerio into js object

new to Cheerio and js. I am trying to scrape all the pitcher names and their associated stats into a JSON object like this:

var pitchers = {
    name: 'Just Verlander',
    era: 6.62
    etc...
    etc...
}

Here is the html I am trying to scrape:

<tr class="">
<td class="stat-name-width"><img src="../../style/assets/img/mlb/team-logos/tigers.png" height="20"/>  
<span class="pitcher-name">Justin Verlander</span> 
<div class="fantasy-blue inline fantasy-data pitcher-salary-fd">$7,100</div>   
<small class="text-muted pitches">(R)</small> 
<small class="text-muted matchup">(@ BOS)</small></td>
        <td class="stat-stat-width fantasy-blue fantasy-points">
        <td class="stat-stat-width">0-3</td>
        <td class="stat-stat-width">6.62</td>
        <td class="stat-stat-width">1.50</td>
        <td class="stat-stat-width">5.82</td>
        <td class="stat-stat-width">3.18</td>
        <td class="stat-stat-width">2.12</td>
        <td class="stat-stat-width">5.67</td>
        <td class="stat-stat-width">1.03x</td>
        <td class="stat-stat-width">0.96x</td>
        <td class="stat-stat-width">1.09x</td>
        <td class="stat-stat-width">0.90x</td>
</tr> 

There are roughly 30 pitchers with this same structure on the same page.

here is what I have so far:

test = $('span.pitcher-name').text(); gives me all of the pitcher names, not just one.

Obviously I'm not even close... I can't figure out how to get the children of the pitcher name to associate to a javascript object... Any help is uberly appreciated!

Upvotes: 1

Views: 1331

Answers (2)

TylerWaite17
TylerWaite17

Reputation: 153

It looks like what you want is the $().each() function.

With this function, you can iterate through each instance of a tag and execute a callback function, like so:

var someObjArr = [];

$('span.pitcher-name').each(function(i, element){

    //Get the text from cheerio.
    var text = $(this).text();

    //if undefined, create the object inside of our array.
    if(someObjArr[i] == undefined){

        someObjArr[i] = {};
    };

    //Update the name property of our object with the text value.
    someObjArr[i].name = text;
}); 

$('div.pitcher-salary-fd').each(function(i, element){

    //Get the text from cheerio.
    var text = $(this).text();

    //if undefined, create the object inside of our array.
    if(someObjArr[i] == undefined){

        someObjArr[i] = {};
    };

    //Update the salary property of our object with the text value.
    someObjArr[i].salary = text;
}); 

console.log(someObjArr); //[ { name: 'Justin Verlander', salary: '$7,100' } ]

One of the best parts about this function is that it works synchronously, so it's bears resemblance to a for-loop and is easy to understand.

Keep in mind that you can print out the sub-elements of each in the $(this) part of the callback. This is especially useful in cases where you need to figure out the specific thing to need to put as a tag. For Example:

$('span.pitcher-name').each(function(i, element){

    //Return the entire element.
    var pitcherNameElement = $(this);

    //Prints all of the element's properties.
    console.log(pitcherNameElement); 

});

Now, in order to retrieve more abstract things, like an array of items that are all in the same table row, things get a slightly more complex. In order to do that, we need to use the $().each function on the table row, and then check each child's class for the matching ones. This way, we can keep with the same index.

$('tr').each(function(i, element){

    //get all children of a table row
    var children = $(this)['0'].children;

    //this array will hold the matchup data
    var matchupArr = [];

    //class to extract
    var statClass = 'stat-stat-width';

    //for loop-ing the children
    for(var myInt=0; myInt<children.length; myInt++){

        //the next element of this child
        var next = children[myInt].next;

        //sometimes next is undefined
        if(next != undefined){

            //get the html attribs of the next element
            var attribs = next.attribs;

            //sometimes the next element has no attribs
            if(attribs != undefined){

                //class of the next element
                var myClass = attribs.class;

                //if the next element's class if the one we want
                if(myClass == statClass){

                    //push it to our matchup array
                    matchupArr.push(next.children[0].data);
                };
            };
        };
    };

    //if undefined, create the object inside of our array.
    if(someObjArr[i] == undefined){

        someObjArr[i] = {};
    };

    //Update the matchup property of our object with our array.
    if(matchupArr.length >0){
        someObjArr[i].matchups = matchupArr;
    };
});

This is a bit of a hack, but it shows the underlying concept. A method that allows you to do a callback on all children C, within parent P, would be a nice addition to the library. But alas, we live in an imperfect world.

Good luck, and happy scraping!

Upvotes: 3

xhocquet
xhocquet

Reputation: 380

Have you seen the documentation? If you go down, there is a great number of examples of how to traverse a site's elements.

For example:

$('#span.pitcher-name').next() //{['<small class="text-muted pitches">(R)</small>']}

Upvotes: 0

Related Questions