laogoat
laogoat

Reputation: 25

How to capture the first pattern after certain string

I need to filter the first datetime or that line after 'Report Date' which is '25/01/2011 2:23 AM' in the sample below. Can anyone help??

<td colspan="2">
<table cellpadding="0" cellspacing="0" lang="en-AU">
<tr>
<td class="a31" style="WIDTH:39.50mm;word-wrap:break-word;HEIGHT:4.00mm;">Report Date</td>
</tr>
</table>
</td>
<td colspan="2">
<table cellpadding="0" cellspacing="0" lang="en-AU">
<tr>
<td class="a10" style="WIDTH:48.00mm;word-wrap:break-word;HEIGHT:4.00mm;">25/01/2011 2:23 AM</td>
</tr>
</table>
</td>
<td colspan="11">
</td>

Upvotes: 0

Views: 89

Answers (3)

mattkelly
mattkelly

Reputation: 626

If you really must use regex (since you asked...):

Regex exp = new Regex(@"class="a10".*>(\d+/\d+/\d+\s\d+:\d+\sAM)");
MatchCollection MatchList = exp.Matches(InputText);
Match FirstMatch = MatchList[0];

This gets all of the matches, storing them in MatchList. The first (and only, for this case) result is stored in FirstMatch. You may be able to skip the list creation if there's only ever going to be one field you need to capture.

However, like others have stated, you really shouldn't be explicitly using regex for this problem.

Upvotes: 0

BrokenGlass
BrokenGlass

Reputation: 161002

Just use the Html Agility Pack instead. Using a RegEx for this special case might work, but long term is not really maintainable.

For your example this would work:

HtmlDocument doc = new HtmlDocument();
doc.Load("test.html"); // path to your HTML file
var node = doc.DocumentNode.SelectSingleNode("//td[@class='a10']");
string myDateString = node.InnerText;

Upvotes: 0

Petar Ivanov
Petar Ivanov

Reputation: 93090

It is not a good idea to use regex to parse XML or HTML. It's complicated and there are already a lot of parsers that take care of all the details for you. In C# you can use LINQ-to-XML for XML and HtmAgilityPack for HTML.

Upvotes: 3

Related Questions