Reputation: 920
new to Cheerio and js. I am trying to scrape all the pitcher names and their associated stats into a JSON object like this:
var pitchers = {
name: 'Just Verlander',
era: 6.62
etc...
etc...
}
Here is the html I am trying to scrape:
<tr class="">
<td class="stat-name-width"><img src="../../style/assets/img/mlb/team-logos/tigers.png" height="20"/>
<span class="pitcher-name">Justin Verlander</span>
<div class="fantasy-blue inline fantasy-data pitcher-salary-fd">$7,100</div>
<small class="text-muted pitches">(R)</small>
<small class="text-muted matchup">(@ BOS)</small></td>
<td class="stat-stat-width fantasy-blue fantasy-points">
<td class="stat-stat-width">0-3</td>
<td class="stat-stat-width">6.62</td>
<td class="stat-stat-width">1.50</td>
<td class="stat-stat-width">5.82</td>
<td class="stat-stat-width">3.18</td>
<td class="stat-stat-width">2.12</td>
<td class="stat-stat-width">5.67</td>
<td class="stat-stat-width">1.03x</td>
<td class="stat-stat-width">0.96x</td>
<td class="stat-stat-width">1.09x</td>
<td class="stat-stat-width">0.90x</td>
</tr>
There are roughly 30 pitchers with this same structure on the same page.
here is what I have so far:
test = $('span.pitcher-name').text(); gives me all of the pitcher names, not just one.
Obviously I'm not even close... I can't figure out how to get the children of the pitcher name to associate to a javascript object... Any help is uberly appreciated!
Upvotes: 1
Views: 1331
Reputation: 153
It looks like what you want is the $().each() function.
With this function, you can iterate through each instance of a tag and execute a callback function, like so:
var someObjArr = [];
$('span.pitcher-name').each(function(i, element){
//Get the text from cheerio.
var text = $(this).text();
//if undefined, create the object inside of our array.
if(someObjArr[i] == undefined){
someObjArr[i] = {};
};
//Update the name property of our object with the text value.
someObjArr[i].name = text;
});
$('div.pitcher-salary-fd').each(function(i, element){
//Get the text from cheerio.
var text = $(this).text();
//if undefined, create the object inside of our array.
if(someObjArr[i] == undefined){
someObjArr[i] = {};
};
//Update the salary property of our object with the text value.
someObjArr[i].salary = text;
});
console.log(someObjArr); //[ { name: 'Justin Verlander', salary: '$7,100' } ]
One of the best parts about this function is that it works synchronously, so it's bears resemblance to a for-loop and is easy to understand.
Keep in mind that you can print out the sub-elements of each in the $(this) part of the callback. This is especially useful in cases where you need to figure out the specific thing to need to put as a tag. For Example:
$('span.pitcher-name').each(function(i, element){
//Return the entire element.
var pitcherNameElement = $(this);
//Prints all of the element's properties.
console.log(pitcherNameElement);
});
Now, in order to retrieve more abstract things, like an array of items that are all in the same table row, things get a slightly more complex. In order to do that, we need to use the $().each function on the table row, and then check each child's class for the matching ones. This way, we can keep with the same index.
$('tr').each(function(i, element){
//get all children of a table row
var children = $(this)['0'].children;
//this array will hold the matchup data
var matchupArr = [];
//class to extract
var statClass = 'stat-stat-width';
//for loop-ing the children
for(var myInt=0; myInt<children.length; myInt++){
//the next element of this child
var next = children[myInt].next;
//sometimes next is undefined
if(next != undefined){
//get the html attribs of the next element
var attribs = next.attribs;
//sometimes the next element has no attribs
if(attribs != undefined){
//class of the next element
var myClass = attribs.class;
//if the next element's class if the one we want
if(myClass == statClass){
//push it to our matchup array
matchupArr.push(next.children[0].data);
};
};
};
};
//if undefined, create the object inside of our array.
if(someObjArr[i] == undefined){
someObjArr[i] = {};
};
//Update the matchup property of our object with our array.
if(matchupArr.length >0){
someObjArr[i].matchups = matchupArr;
};
});
This is a bit of a hack, but it shows the underlying concept. A method that allows you to do a callback on all children C, within parent P, would be a nice addition to the library. But alas, we live in an imperfect world.
Good luck, and happy scraping!
Upvotes: 3
Reputation: 380
Have you seen the documentation? If you go down, there is a great number of examples of how to traverse a site's elements.
For example:
$('#span.pitcher-name').next()
//{['<small class="text-muted pitches">(R)</small>']}
Upvotes: 0