Reputation: 15162
Here is sample page:
https://www.ncbi.nlm.nih.gov/pubmed/?term=hg38
it has 40 results. How to get to next page using URL with something like:
https://www.ncbi.nlm.nih.gov/pubmed/?term=hg38**?page=2**
I know how to use scraping libraries (BS4, Selenium) but I don't know how to scrape sites like that. I've been playing with Google Chrome dev tools unsuccessfully.
I know pubmed has API but API doesn't return info that I need (weather article is freely downloadable or not).What's the usual workflow in scraping sites like that in Python?
Upvotes: 0
Views: 1561
Reputation: 1766
Scraping paginated information from a website don't require a specific URL. Most of the sites, the link wont display link text in the page source . It will be like #
or something like that.
While using selenium in pagination, don't need to bother about finding the URL links, instead of that use Click
method to make a click action in the next
option availiable.
In the above mentioned website iterate over next
option and yield till final page. when the final page reaches, it wont have next option, so we can quit from there.
Upvotes: 1
Reputation: 2404
The pages are not part of the URL scheme. You should look at the python Selenium driver. With Selenium you can load the page and have your program click buttons on the page to change content on the page, this way you can get to page two on the site, and then continue to scrape the that newly displayed HTML.
Upvotes: 1