Jono Tarantino
Jono Tarantino

Reputation: 13

Delete Table Column with HTML Agility Pack

I have scraped a table from a website using C# for my own website and loaded it into a string. There are too many columns so I was wondering if there was an easy way to delete some, probably using HTML Agility Pack but in C# if necessary.

The table in the string looks like this:

    <table>
        <tr>
            <th scope="col">&nbsp; </th>
            <th scope="col">&nbsp; </th>
            <th scope="col">P </th>
            <th scope="col">W </th>
            <th scope="col">L </th>
            <th scope="col">T </th>
            <th scope="col">NR </th>
            <th scope="col">Bat </th>
            <th scope="col">Bowl </th>
            <th scope="col">Pen </th>
            <th scope="col">Pts </th>
        </tr>
        <tr>
            <td>1 </td>
            <td><a href="fixbyteam.aspx?clubid=44576&teamid=58170&divid=32181">Rayleigh 2nd</a> </td>
            <td>12 </td>
            <td>8 </td>
            <td>1 </td>
            <td>0 </td>
            <td>3 </td>
            <td>14 </td>
            <td>52 </td>
            <td>0 </td>
            <td>209 </td>
        </tr>
        <tr>
            <td>2 </td>
            <td><a href="fixbyteam.aspx?clubid=44612&teamid=58169&divid=32181">Rainham 1st</a> </td>
            <td>12 </td>
            <td>8 </td>
            <td>1 </td>
            <td>1 </td>
            <td>2 </td>
            <td>12 </td>
            <td>56 </td>
            <td>-15 </td>
            <td>199 </td>
        </tr>
        <tr class="lineAbove">
            <td>3 </td>
            <td><a href="fixbyteam.aspx?clubid=44571&teamid=58162&divid=32181">Old Chelmsfordians 2nd</a> </td>
            <td>12 </td>
            <td>5 </td>
            <td>5 </td>
            <td>0 </td>
            <td>2 </td>
            <td>10 </td>
            <td>48 </td>
            <td>0 </td>
            <td>148 </td>
        </tr>
        <tr>
            <td>4 </td>
            <td><a href="fixbyteam.aspx?clubid=44570&teamid=58161&divid=32181">Little Baddow 2nd</a> </td>
            <td>12 </td>
            <td>5 </td>
            <td>4 </td>
            <td>0 </td>
            <td>3 </td>
            <td>21 </td>
            <td>43 </td>
            <td>-15 </td>
            <td>144 </td>
        </tr>
        <tr>
            <td>5 </td>
            <td><a href="fixbyteam.aspx?clubid=44606&teamid=58159&divid=32181">Rayne 1st</a> </td>
            <td>12 </td>
            <td>5 </td>
            <td>4 </td>
            <td>0 </td>
            <td>3 </td>
            <td>6 </td>
            <td>39 </td>
            <td>0 </td>
            <td>140 </td>
        </tr>
        <tr>
            <td>6 </td>
            <td><a href="fixbyteam.aspx?clubid=44605&teamid=58158&divid=32181">Terling 1st</a> </td>
            <td>12 </td>
            <td>4 </td>
            <td>5 </td>
            <td>1 </td>
            <td>2 </td>
            <td>12 </td>
            <td>35 </td>
            <td>0 </td>
            <td>129 </td>
        </tr>
        <tr>
            <td>7 </td>
            <td><a href="fixbyteam.aspx?clubid=44602&teamid=58154&divid=32181">Willow Herbs 1st</a> </td>
            <td>12 </td>
            <td>4 </td>
            <td>6 </td>
            <td>0 </td>
            <td>2 </td>
            <td>9 </td>
            <td>34 </td>
            <td>0 </td>
            <td>117 </td>
        </tr>
        <tr>
            <td>8 </td>
            <td><a href="fixbyteam.aspx?clubid=50925&teamid=68864&divid=32181">Ongar 1st</a> </td>
            <td>12 </td>
            <td>3 </td>
            <td>5 </td>
            <td>0 </td>
            <td>4 </td>
            <td>3 </td>
            <td>42 </td>
            <td>-5 </td>
            <td>108 </td>
        </tr>
        <tr class="lineAbove">
            <td>9 </td>
            <td><a href="fixbyteam.aspx?clubid=44607&teamid=58163&divid=32181">Sandon Sports 1st</a> </td>
            <td>12 </td>
            <td>3 </td>
            <td>6 </td>
            <td>0 </td>
            <td>3 </td>
            <td>8 </td>
            <td>27 </td>
            <td>0 </td>
            <td>98 </td>
        </tr>
        <tr>
            <td>10 </td>
            <td><a href="fixbyteam.aspx?clubid=44582&teamid=58156&divid=32181">Little Waltham 2nd</a> </td>
            <td>12 </td>
            <td>1 </td>
            <td>9 </td>
            <td>0 </td>
            <td>2 </td>
            <td>14 </td>
            <td>25 </td>
            <td>0 </td>
            <td>65 </td>
        </tr>
    </table>

And I want to delete columns 8-10 (Bat, Bowl and Pen). I'm not really sure where to start so any pointers would be helpful!

Upvotes: 1

Views: 2065

Answers (1)

Oded
Oded

Reputation: 499212

You would need to iterate over each tr and remove the 8th, 9th and 10th td nodes from each.

bool first = true;
foreach (HtmlNode row in doc.DocumentNode.SelectNodes("//tr"))
{
    if (first)
    {
        row.RemoveChild(row.SelectSingleNode("th[10]"));
        row.RemoveChild(row.SelectSingleNode("th[9]"));
        row.RemoveChild(row.SelectSingleNode("th[8]"));
        first = false;
    }
    else
    {
        row.RemoveChild(row.SelectSingleNode("td[10]"));
        row.RemoveChild(row.SelectSingleNode("td[9]"));
        row.RemoveChild(row.SelectSingleNode("td[8]"));
    }
}

Upvotes: 2

Related Questions