MattHodson
MattHodson

Reputation: 796

Removing Columns from a HTML Table

I'm trying to delete the 3rd and 4th <td> and <th> from my table using HtmlAgilityPack.

Example table string:

<table>
   <thead>
      <tr>
         <th>Item</th>
         <th>Price</th>
         <th>Change</th>
         <th></th>
      </tr>
   </thead>
   <tbody>
      <tr>
         <td>
            <h2>Top Menu Items</h2>
         </td>
      </tr>
      <tr>
         <td> Diced Angus Steak <span>(7oz)</span></td>
         <td>$13.50</td>
         <td>
            - -
         </td>
         <td>
            <span>
            </span>
            <span>
            </span>
         </td>
      </tr>
      <tr>
         <td> Kimchi Cheese Beef Pepper Rice</td>
         <td>$15.00</td>
         <td>
            - -
         </td>
         <td>
            <span>
            </span>
            <span>
            </span>
         </td>
      </tr>
      <tr>
         <td> Classic Beef Pepper Rice</td>
         <td>$13.50</td>
         <td>
            - -
         </td>
         <td>
            <span>
            </span>
            <span>
            </span>
         </td>
      </tr>
      <tr>
         <td>
            <h2>Steaks</h2>
         </td>
      </tr>
      <tr>
         <td> Angus Rib Eye Steak <span>(8oz)</span></td>
         <td>$25.50</td>
         <td>
            - -
         </td>
         <td>
            <span>
            </span>
            <span>
            </span>
         </td>
      </tr>
      <tr>
         <td> Angus Sirloin Steak <span>(8oz)</span></td>
         <td>$22.50</td>
         <td>
            - -
         </td>
         <td>
            <span>
            </span>
            <span>
            </span>
         </td>
      </tr>
      <tr>
         <td> Diced Angus Steak <span>(7oz)</span> <span>(Steaks)</span></td>
         <td>$13.50</td>
         <td>
            - -
         </td>
         <td>
            <span>
            </span>
            <span>
            </span>
         </td>
      </tr>
      <tr>
         <td> Chicken Breast Steak <span>(8oz)</span></td>
         <td>$14.00</td>
         <td>
            - -
         </td>
         <td>
            <span>
            </span>
            <span>
            </span>
         </td>
      </tr>
      <tr>
         <td> Premium Hamburger Steak <span>(10oz)</span></td>
         <td>$16.00</td>
         <td>
            - -
         </td>
         <td>
            <span>
            </span>
            <span>
            </span>
         </td>
      </tr>
      <tr>
         <td>
            <h2>Pepper Rice</h2>
         </td>
      </tr>
      <tr>
         <td> Sambar Pepper Rice</td>
         <td>$13.50</td>
         <td>
            - -
         </td>
         <td>
            <span>
            </span>
            <span>
            </span>
         </td>
      </tr>
      <tr>
         <td> Kimchi Cheese Beef Pepper Rice <span>(Pepper Rice)</span></td>
         <td>$15.00</td>
         <td>
            - -
         </td>
         <td>
            <span>
            </span>
            <span>
            </span>
         </td>
      </tr>
      <tr>
         <td> Chicken Pepper Rice</td>
         <td>$13.50</td>
         <td>
            - -
         </td>
         <td>
            <span>
            </span>
            <span>
            </span>
         </td>
      </tr>
      <tr>
         <td> Salmon Pepper Rice</td>
         <td>$15.00</td>
         <td>
            - -
         </td>
         <td>
            <span>
            </span>
            <span>
            </span>
         </td>
      </tr>
      <tr>
         <td> Classic Beef Pepper Rice <span>(Pepper Rice)</span></td>
         <td>$13.50</td>
         <td>
            - -
         </td>
         <td>
            <span>
            </span>
            <span>
            </span>
         </td>
      </tr>
      <tr>
         <td>
            <h2>Sides</h2>
         </td>
      </tr>
      <tr>
         <td> Rice</td>
         <td>$3.00</td>
         <td>
            - -
         </td>
         <td>
            <span>
            </span>
            <span>
            </span>
         </td>
      </tr>
      <tr>
         <td> Miso Soup</td>
         <td>$3.00</td>
         <td>
            - -
         </td>
         <td>
            <span>
            </span>
            <span>
            </span>
         </td>
      </tr>
      <tr>
         <td> Sauteed String Beans</td>
         <td>$4.00</td>
         <td>
            - -
         </td>
         <td>
            <span>
            </span>
            <span>
            </span>
         </td>
      </tr>
      <tr>
         <td> Sauteed Corn</td>
         <td>$4.00</td>
         <td>
            - -
         </td>
         <td>
            <span>
            </span>
            <span>
            </span>
         </td>
      </tr>
      <tr>
         <td> Kimchi</td>
         <td>$5.00</td>
         <td>
            - -
         </td>
         <td>
            <span>
            </span>
            <span>
            </span>
         </td>
      </tr>
      <tr>
         <td> French Fries</td>
         <td>$4.00</td>
         <td>
            - -
         </td>
         <td>
            <span>
            </span>
            <span>
            </span>
         </td>
      </tr>
      <tr>
         <td> Onion Rings</td>
         <td>$5.00</td>
         <td>
            - -
         </td>
         <td>
            <span>
            </span>
            <span>
            </span>
         </td>
      </tr>
      <tr>
         <td> Deep Fried Dumpling</td>
         <td>$8.00</td>
         <td>
            - -
         </td>
         <td>
            <span>
            </span>
            <span>
            </span>
         </td>
      </tr>
      <tr>
         <td> Sausages</td>
         <td>$7.50</td>
         <td>
            - -
         </td>
         <td>
            <span>
            </span>
            <span>
            </span>
         </td>
      </tr>
      <tr>
         <td>
            <h2>Salad</h2>
         </td>
      </tr>
      <tr>
         <td> Large Salad</td>
         <td>$7.00</td>
         <td>
            - -
         </td>
         <td>
            <span>
            </span>
            <span>
            </span>
         </td>
      </tr>
      <tr>
         <td> Small Salad</td>
         <td>$3.00</td>
         <td>
            - -
         </td>
         <td>
            <span>
            </span>
            <span>
            </span>
         </td>
      </tr>
      <tr>
         <td> Large Seaweed Salad</td>
         <td>$9.00</td>
         <td>
            - -
         </td>
         <td>
            <span>
            </span>
            <span>
            </span>
         </td>
      </tr>
      <tr>
         <td> Small Seaweed Salad</td>
         <td>$5.00</td>
         <td>
            - -
         </td>
         <td>
            <span>
            </span>
            <span>
            </span>
         </td>
      </tr>
   </tbody>
</table>

I send the following string to this method, to remove the 3rd and 4th <td> and <th>.

public static string deleteCols(string table)
{
    var doc = new HtmlDocument();
    doc.LoadHtml(table);

    bool first = true;
    foreach (HtmlNode row in doc.DocumentNode.SelectNodes("//tr"))
    {
        if (first)
        {
            try
            {
                var th3 = row.SelectSingleNode("th[3]");
                row.RemoveChild(th3);
            }
            catch
            {

            }
            try
            {
                var th4 = row.SelectSingleNode("th[4]");
                row.RemoveChild(th4);
            }
            catch
            {

            }
            first = false;
        }
        else
        {
            try
            {
                var td3 = row.SelectSingleNode("td[3]");
                row.RemoveChild(td3);
            }
            catch
            {

            }
            try
            {
                var td4 = row.SelectSingleNode("th[4]");
                row.RemoveChild(td4);
            }
            catch
            {

            }
        }
    }


    foreach (HtmlNode row2 in doc.DocumentNode.SelectNodes("//span"))
    {
        row2.Remove();
    }

    return doc.DocumentNode.InnerHtml;

}

Which gives me the following result:

<table>
   <thead>
      <tr>
         <th>Item</th>
         <th>Price</th>
         <th></th>
      </tr>
   </thead>
   <tbody>
      <tr>
         <td>
            <h2>Top Menu Items</h2>
         </td>
      </tr>
      <tr>
         <td> Diced Angus Steak </td>
         <td>$13.50</td>
         <td>
         </td>
      </tr>
      <tr>
         <td> Kimchi Cheese Beef Pepper Rice</td>
         <td>$15.00</td>
         <td>
         </td>
      </tr>
      <tr>
         <td> Classic Beef Pepper Rice</td>
         <td>$13.50</td>
         <td>
         </td>
      </tr>
      <tr>
         <td>
            <h2>Steaks</h2>
         </td>
      </tr>
      <tr>
         <td> Angus Rib Eye Steak </td>
         <td>$25.50</td>
         <td>
         </td>
      </tr>
      <tr>
         <td> Angus Sirloin Steak </td>
         <td>$22.50</td>
         <td>
         </td>
      </tr>
      <tr>
         <td> Diced Angus Steak  </td>
         <td>$13.50</td>
         <td>
         </td>
      </tr>
      <tr>
         <td> Chicken Breast Steak </td>
         <td>$14.00</td>
         <td>
         </td>
      </tr>
      <tr>
         <td> Premium Hamburger Steak </td>
         <td>$16.00</td>
         <td>
         </td>
      </tr>
      <tr>
         <td>
            <h2>Pepper Rice</h2>
         </td>
      </tr>
      <tr>
         <td> Sambar Pepper Rice</td>
         <td>$13.50</td>
         <td>
         </td>
      </tr>
      <tr>
         <td> Kimchi Cheese Beef Pepper Rice </td>
         <td>$15.00</td>
         <td>
         </td>
      </tr>
      <tr>
         <td> Chicken Pepper Rice</td>
         <td>$13.50</td>
         <td>
         </td>
      </tr>
      <tr>
         <td> Salmon Pepper Rice</td>
         <td>$15.00</td>
         <td>
         </td>
      </tr>
      <tr>
         <td> Classic Beef Pepper Rice </td>
         <td>$13.50</td>
         <td>
         </td>
      </tr>
      <tr>
         <td>
            <h2>Sides</h2>
         </td>
      </tr>
      <tr>
         <td> Rice</td>
         <td>$3.00</td>
         <td>
         </td>
      </tr>
      <tr>
         <td> Miso Soup</td>
         <td>$3.00</td>
         <td>
         </td>
      </tr>
      <tr>
         <td> Sauteed String Beans</td>
         <td>$4.00</td>
         <td>
         </td>
      </tr>
      <tr>
         <td> Sauteed Corn</td>
         <td>$4.00</td>
         <td>
         </td>
      </tr>
      <tr>
         <td> Kimchi</td>
         <td>$5.00</td>
         <td>
         </td>
      </tr>
      <tr>
         <td> French Fries</td>
         <td>$4.00</td>
         <td>
         </td>
      </tr>
      <tr>
         <td> Onion Rings</td>
         <td>$5.00</td>
         <td>
         </td>
      </tr>
      <tr>
         <td> Deep Fried Dumpling</td>
         <td>$8.00</td>
         <td>
         </td>
      </tr>
      <tr>
         <td> Sausages</td>
         <td>$7.50</td>
         <td>
         </td>
      </tr>
      <tr>
         <td>
            <h2>Salad</h2>
         </td>
      </tr>
      <tr>
         <td> Large Salad</td>
         <td>$7.00</td>
         <td>
         </td>
      </tr>
      <tr>
         <td> Small Salad</td>
         <td>$3.00</td>
         <td>
         </td>
      </tr>
      <tr>
         <td> Large Seaweed Salad</td>
         <td>$9.00</td>
         <td>
         </td>
      </tr>
      <tr>
         <td> Small Seaweed Salad</td>
         <td>$5.00</td>
         <td>
         </td>
      </tr>
   </tbody>
</table>

As you can see, some of the elements I wish to delete are still there. Does anybody know what I'm doing wrong here?!

Upvotes: 0

Views: 278

Answers (1)

Sargis Tovmasyan
Sargis Tovmasyan

Reputation: 123

When you remove the 3rd th/tds from the row's children, the 4th item becomes the 3rd, so you're trying to remove a non-existing element.

As a solution, you can either store the elements in variables at first, and then delete them; or you can start removing from the 4th index.

Upvotes: 1

Related Questions