Retrieving data using regex in c#

Question

Data:




some. .data...


Black
57234
5431.60
  -125.02



some. .data...


Blue
57234
5431.60
  -125.02



some. .data...


Brown
57234
5431.60
  -125.02

...more data...

I want to extract 'some. .data...'; 'Black'; '57234'; '5431.60'; at one time. [fifth td data is not required.]

Initially,

([a-zA-Z0-9 -]+)(\w+)([\d]+\.\d+)(\d+\.\d+)

was working. (via hit and miss approach)

But, now it's broke.

Now, when I use (.*) or <\w+>(.*) : it shows data from last four tds in every tr. But then, Why won't it show ... and how can I get data I want?

Oded · Accepted Answer

Regex is, in general, a bad way to parse HTML.

I suggest taking a look at the HTML Agility Pack or CsQuery that are purpose built HTML parsers for .NET.

The HTML Agility Pack can be queried using XPath and LINQ, and CsQuery uses jQuery selectors.

Retrieving data using regex in c#

Answers (2)

Related Questions