Hrvoje
Hrvoje

Reputation: 15162

Python - How to scrape paginated pages without pagination in URL

Here is sample page:

https://www.ncbi.nlm.nih.gov/pubmed/?term=hg38

it has 40 results. How to get to next page using URL with something like:

https://www.ncbi.nlm.nih.gov/pubmed/?term=hg38**?page=2**

I know how to use scraping libraries (BS4, Selenium) but I don't know how to scrape sites like that. I've been playing with Google Chrome dev tools unsuccessfully.

I know pubmed has API but API doesn't return info that I need (weather article is freely downloadable or not).What's the usual workflow in scraping sites like that in Python?

Upvotes: 0

Views: 1561

Answers (2)

Arun Augustine
Arun Augustine

Reputation: 1766

Scraping paginated information from a website don't require a specific URL. Most of the sites, the link wont display link text in the page source . It will be like # or something like that.

While using selenium in pagination, don't need to bother about finding the URL links, instead of that use Click method to make a click action in the next option availiable.

In the above mentioned website iterate over next option and yield till final page. when the final page reaches, it wont have next option, so we can quit from there.

Upvotes: 1

9716278
9716278

Reputation: 2404

The pages are not part of the URL scheme. You should look at the python Selenium driver. With Selenium you can load the page and have your program click buttons on the page to change content on the page, this way you can get to page two on the site, and then continue to scrape the that newly displayed HTML.

Python3 Selenium Driver

Selenium Documentation

Upvotes: 1

Related Questions