Reputation: 25
I need to filter the first datetime or that line after 'Report Date' which is '25/01/2011 2:23 AM' in the sample below. Can anyone help??
<td colspan="2">
<table cellpadding="0" cellspacing="0" lang="en-AU">
<tr>
<td class="a31" style="WIDTH:39.50mm;word-wrap:break-word;HEIGHT:4.00mm;">Report Date</td>
</tr>
</table>
</td>
<td colspan="2">
<table cellpadding="0" cellspacing="0" lang="en-AU">
<tr>
<td class="a10" style="WIDTH:48.00mm;word-wrap:break-word;HEIGHT:4.00mm;">25/01/2011 2:23 AM</td>
</tr>
</table>
</td>
<td colspan="11">
</td>
Upvotes: 0
Views: 89
Reputation: 626
If you really must use regex (since you asked...):
Regex exp = new Regex(@"class="a10".*>(\d+/\d+/\d+\s\d+:\d+\sAM)");
MatchCollection MatchList = exp.Matches(InputText);
Match FirstMatch = MatchList[0];
This gets all of the matches, storing them in MatchList. The first (and only, for this case) result is stored in FirstMatch. You may be able to skip the list creation if there's only ever going to be one field you need to capture.
However, like others have stated, you really shouldn't be explicitly using regex for this problem.
Upvotes: 0
Reputation: 161002
Just use the Html Agility Pack instead. Using a RegEx for this special case might work, but long term is not really maintainable.
For your example this would work:
HtmlDocument doc = new HtmlDocument();
doc.Load("test.html"); // path to your HTML file
var node = doc.DocumentNode.SelectSingleNode("//td[@class='a10']");
string myDateString = node.InnerText;
Upvotes: 0
Reputation: 93090
It is not a good idea to use regex to parse XML or HTML. It's complicated and there are already a lot of parsers that take care of all the details for you. In C# you can use LINQ-to-XML for XML and HtmAgilityPack for HTML.
Upvotes: 3