Reputation:
I want to write a program that searches through a fairly large website and extracts certain things. I've had a couple online Python courses, but neither said anything about how to access the internet with Python. I have no idea where I ought to start with this.
Upvotes: 5
Views: 43861
Reputation: 549
You have first to read about the standard python library urllib2.
Once you are comfortable with the basic ideas behind this lib you can try requests which is much easier to interact with the web especially APIs. I suggest using it in parallel with httpie to test out queries quick and dirty from command line.
If you go a little further building a librairy or an engine to crawl the web you will need some sort of asynchronous programming, I recommend starting with Gevent
Finally, if you want to create a crawler/bot you can take a look at Scrapy. You should however start with basic libraries before diving into this one as it can get quite complex
Upvotes: 5
Reputation: 15692
There is much more in the internet than just websites, but I assume that you just want to crawl some html pages and extract data from them. You have many many options to solve that problem. Just some starting points:
Upvotes: 2
Reputation: 1473
It sounds like you want a web crawler/scraper. What sorts of things do you want to pull? Images? Links? Just the job for a web crawler/scraper.
Start there, there should be lots of articles on Stackoverflow that will help you implement details such as connecting to the internet (getting a web response).
See this article.
Upvotes: 3