Reputation: 469
I'm learning how to webscrape with python and I'm wondering if it's possible to grab two pages with requests.get()
so that I don't have to make two separate calls and variables. For example:
r1 = requests.get("page1")
r2 = requests.get("page2")
pg1 = BeautifulSoup(r1.content, "html.parser")
pg2 = BeautifulSoup(r2.content, "html.parser")
As you can see there's repeated code. Any way around this? Thanks!
Upvotes: 4
Views: 9580
Reputation: 738
You can use list assignment and comprehensions, although it isn't much shorter with only two pages.
pg1, pg2 = [ BeautifulSoup(requests.get(page).content, "html.parser")
for page in ["page1","page2"] ]
Upvotes: 6
Reputation: 191
I like the grequests library for fetching multiple URLS at one time, instead of requests. Especially when dealing with alot of URLS or a single URL with many sub-pages.
import grequests
urls = ['http://google.com', 'http://yahoo.com', 'http://bing.com']
unsent_request = (grequests.get(url) for url in urls)
results = grequests.map(unsent_request)
After this, results
can be processed however you need. This works well with JSON data: results[0]
= first URL data, results[1]
= second URL data, etc..
more can be found here
Upvotes: 10