wingerse
wingerse

Reputation: 3796

Trouble crawling html when it is not consistent

I am new to csquery and I am having trouble crawling html like this below:

<li id="Ingredient">
    <span id="Amount" class="ingredient-amount">1 pound</span>
    <span id="Name" class="ingredient-name">sweet Italian Sausage
</li>
<li id="Ingredient">
    <span id="Amount" class="ingredient-amount">3/4 pound</span>
    <span id="Name" class="ingredient-name">lean ground beef</span>
</li>

I want to take out the text inside span tags and format them as follows:

1 pound sweet Italian sausage
3/4 pound lean ground beef

This is my code below :

for (int i = 0; i < dom.Select("#Ingredient").Length; ++i) {
    if (dom.Select("#Ingredient span#Amount")[i] != null)
            Console.WriteLine(dom.Select("#Ingredient span#Amount")[i].InnerHTML + " ");
    if (dom.Select("#Ingredient span#Name")[i] != null)
            Console.WriteLine(dom.Select("#Ingredient span#Name")[i].InnerHTML);
    Console.WriteLine(Environment.NewLine);
}

It works fine with the html above but the problem arises when one of the span is missing. For example if <span id="lblIngName" class="ingredient-name">sweet Italian sausage</span> was missing from the html, my code would return:

1 pound lean ground beef
3/4 pound

As you can see, the lean ground beef went up. I want it to say with 3/4 pound at all costs. And 1 pound can stay alone. How can I do that? I have tried a lot of ways but it didn't work. So I want to do something like : for each "#Ingredient" write the "#Amount" if it exists or "#Name" if it exists. Do not bother with things on another Ingredient

Upvotes: 0

Views: 69

Answers (0)

Related Questions