Leo5188
Leo5188

Reputation: 2067

Suggestions for obtaining Google search results and cleaning HTML tags

I am working on a project to get Google search web pages and then clean HTML tags to obtain pure text content.

Any suggestion for available tools (esp. Python tools)

many thanks.

Upvotes: 1

Views: 434

Answers (3)

Leo5188
Leo5188

Reputation: 2067

Finally found a nice suite BootCat.

Upvotes: 0

John Lehmann
John Lehmann

Reputation: 8225

I'd check out Pattern, which is a Python web mining module providing a suite of text retrieval, analysis, and viz tools. I haven't personally used it but looks powerful.

Module pattern.web is a web toolkit that bundles various API's (Google, Gmail, Bing, Twitter, Wikipedia, Flickr) with a robust HTML parser and web spider. Its purpose is to retrieve online content in an easy-to-use, uniform way.

Upvotes: 2

josh-cain
josh-cain

Reputation: 5226

Python has a built in one that's actually pretty quick, found here. There's also a really powerful one called Beautiful Soup that offers additional functionality, especially for HTML scraping.

However, I also have to ask why not use the search API?

Upvotes: 0

Related Questions