Dustin Sun
Dustin Sun

Reputation: 5532

A C# regex question: retrieve Google search results

I want to store Google search results (both title and link) into database. HTML code of search results is like:

<br/>
THETITLE

And each page has 10 results. Can anyone show me how to retrieve THEURL and THETITLE?

Thank you so much!

Upvotes: 1

Views: 426

Answers (3)

Matthew Flaschen
Matthew Flaschen

Reputation: 284927

Consider using the Google AJAX Search API instead. It will be easier on both you and Google's servers. There are some instructions for using it outside JavaScript environments. They don't give a C# example, but it shouldn't be difficult to adapt to your needs using one of the JSON APIs for C#.

If you do stick with HTML, I also recommend HTML Agility Pack.

You should also think about caching so you minimize both stale data and unnecessary requests.

Upvotes: 0

Rubens Farias
Rubens Farias

Reputation: 57976

You should to give Html Agility Pack a try. An HTML parser is correct way to read HTML content, not regular expressions.

BUT, If you wanna try for your own risk:

<h3 class=r><a .*? href="(?<url>[^"]*)".*?>(?<title>.*?)</a></h3>

You'll have problems with:

  • Line breaks
  • Unmatched tags
  • Minor HTML changes

So, good luck!

Upvotes: 3

t0mm13b
t0mm13b

Reputation: 34592

For starters, I would not recommend using regex for this, use the 'Html Agility Pack' to do the parsing of the HTML document.

Hope this helps, Best regards, Tom.

Upvotes: 1

Related Questions