takayoshi
takayoshi

Reputation: 2799

regexp parsing error in C#

I have html which contains such text

.......
<a class="product_name" href="index.php?productID=29785">Funny</a>
........
<a class="product_name" href="index.php?productID=29787">Very Funny</a>
......

I'd like to href attribute value and text into link so I'd like to get

"index.php?productID=29785", "Funny"
"index.php?productID=29787", "Very Funny"

And I use

MatchCollection mc = Regex.Matches(pageData, 
   "<a class=\"product_name\" href=\"(.+)\">(.+)</a>");

For this. But when I debug code I saw that mc.Count = 0

I think I didn't escaped quotes properly, but I don't know.

Upvotes: 0

Views: 113

Answers (2)

Oded
Oded

Reputation: 499352

Don't parse HTML with regex. See here for a compelling reason why.

Use the HTML Agility Pack instead.

Upvotes: 5

Mikhail
Mikhail

Reputation: 9300

Review the following threads to find possible solution(s):

http://www.dotnetperls.com/scraping-html

Regex to Parse Hyperlinks and Descriptions

Parse HTML links using C#

Upvotes: -1

Related Questions