user1692342
user1692342

Reputation: 5237

Parsing HTML File using cheerio

I have a HTML Document which I would like to parse. I am trying to use cheerio to parse the HTML file.

<ul data-reactid=".0.1.0.0.1.1.0.0.0.0.1.0">
    <li class="_1ht1 _1ht2" data-reactid=".0.1.0.0.1.1.0.0.0.0.1.0.1:$user=xyz">
        .
        .
        .
        .
        <span data-reactid=".0.1.0.0.1.1.0.0.0.0.1.0.1:$user=xyz.0.0.$right.0.0.1.$left.0.1:0">
            My Random Text
        </span>
    </li>
</ul>

From my HTML I am am trying to extract the first instance of the ul tag with data-reactid=".0.1.0.0.1.1.0.0.0.0.1.0"

In that the very first li tag, I want to extract the user, in this case xyz. After that I want to find the text within the span class mentioned in the code.

Through Cheerio I tried the following:

var cheerio = require('cheerio'), 
fs = require('fs'); 

fs.readFile('index.html', 'utf8', dataLoaded);

function dataLoaded(err, data) {
    $ = cheerio.load(data);
    console.log("Trying out " + JSON.stringify($("<ul data-reactid=\".0.1.0.0.1.1.0.0.0.0.1.0\">").data()));
}   

It prints Trying out {"reactid":".0.1.0.0.1.1.0.0.0.0.1.0"} How do I get the value inside the HTML?

Note: xyz is dynamic and it will change

Upvotes: 0

Views: 2601

Answers (3)

Patel
Patel

Reputation: 1478

I think this will work for you if I understood your question correctly :

var myDataReactId = '.0.1.0.0.1.1.0.0.0.0.1.0'
var firstLi = $("ul[data-reactid = '" + myDataReactId + "'] li")[0];
//console.log(firstLi);
var user = $(firstLi).data('reactid');
$(firstLi).find("span[data-reactid*='" + user + "']").text();

Upvotes: 1

Chol Nhial
Chol Nhial

Reputation: 1397

The problem with my first answer is that I didn't actually find the element you would like to extract the reactid from. With some js fiddling I was able to put something together that resembles your scenario. Noticed in the fiddle that I use .html(). Without further ado, here we go: http://jsfiddle.net/0r5k9egu/. Run the fiddle and in the console you should see .0.1.0.0.1.1.0.0.0.0.1.0.1:$user=xyz.0.0.$right.0.0.1.$left.0.1:0

Upvotes: 0

Chol Nhial
Chol Nhial

Reputation: 1397

Try this. Basically it turns your HTML into something jquery can work with, and then it finds the unsorted-list, of course you can make the find more specific. Using .data() it extracts the value of data-reactid attribute.

reactid = $($(data).find('ul>li>span')).data('reactid');

Upvotes: 0

Related Questions