elliot
elliot

Reputation: 1944

Scraping information from a webpage that has a table spanning many pages

I'm using the rvest package in R and would like to scrape some data from a table that only includes about 40% of the total information. I followed this blog post, but it doesn't specify how to scrape data when there is no difference in the HTML address for the different pages. This website is the one I'm trying to obtain some job listing data from.

I've successfully retrieved the data on the first page using this code:

job_page <-
  read_html(
    'page_address'
  )

data_raw <- job_page %>%
  html_node('table') %>%
  html_text()

Is it possible to scrape the webpage when the HTML address is NOT different for multiple pages of data? My hope is to use lapply to iterate over the multiple pages in some way.

Upvotes: 0

Views: 114

Answers (1)

Yifu Yan
Yifu Yan

Reputation: 6106

Try this URL instead, it should give you all results in one page:

http://explore.msujobs.msstate.edu/cw/en-us/filter/?search-keyword=&job-mail-subscribe-privacy=agree&location=main%20campus%20-%20starkville%20ms&category=faculty&page=1&page-items=100

You can open the developer tools in Chrome and select Network tab. You can examine the request and tweak searching parameters.

Upvotes: 1

Related Questions