stanigator
stanigator

Reputation: 10934

Difficulty of implementing this crawler

From your experience, how difficult do you think it takes to programmatically search for a term in the Yellow Pages website and then scrape off the contact information from the results into a CSV file?

Upvotes: 1

Views: 128

Answers (3)

Tsubasa Kato
Tsubasa Kato

Reputation: 33

Using Perl and some modules like WWW::Robot will probably be not that hard. I didn't try, but since you know Python, Scrapy might help. http://scrapy.org

Remember not to hammer the site when you crawl, because your IP can get banned.

Upvotes: 1

daniel gratzer
daniel gratzer

Reputation: 53881

With the right modules and libraries its very do-able! It depends on your tools though, Perl or Python and you'll be all set. If you're trying to do this with C++ You may have a bit more pain heading your way.

If you provide more information about your situation (language frameworks constraints) I can be more specific.

Also there are legal issues to consider with scraping, I am not sure of the Yellow Pages policy on bots. Read their robots.txt before proceeding. http://www.robotstxt.org/ should give you some starting information about learning about this stuff.

The best way to be both safe and legal is to just use the API, http://developer.yp.com/

Upvotes: 0

Bill the Lizard
Bill the Lizard

Reputation: 405745

Can you just use the YP Search API? Access is free, and it only takes a minute to set up a developer account.

Upvotes: 2

Related Questions