Naressh Thiagaraj
Naressh Thiagaraj

Reputation: 1

rvest does not take past a certain amount

I am attempting to scrape this website url= https://www.kimovil.com/en/compare-smartphones/f_min_dm+unveileddate.3,i_b+slug.samsung

I am using rvest to scrape this website. This is the code I am using.

site <- 'https://www.kimovil.com/en/compare-smartphones/f_min_dm+unveileddate.3,i_b+slug.samsung'
website <- read_html(site)

device_label_html <- html_nodes(website,'div.device-name')
device_label <- html_text(device_label_html)
head(device_label,n=60)

Once I run this code, it takes up to 40 results (phones) although it is supposed to be 51 results (phones). Can someone help me on this. Thank you.

Upvotes: -1

Views: 61

Answers (1)

Brian Montgomery
Brian Montgomery

Reputation: 2414

The website is paged internally. There might be a more elegant way to do this. I would definitely look for one if it were more than 2 pages, but this works:

library(rvest)
site <- 'https://www.kimovil.com/en/compare-smartphones/f_min_dm+unveileddate.3,i_b+slug.samsung'
website <- read_html(site)

device_label_html <- html_nodes(website,'div.device-name')
device_label <- html_text(device_label_html)

site2 <- 'https://www.kimovil.com/en/compare-smartphones/f_min_dm+unveileddate.3,i_b+slug.samsung,page.2'
website2 <- read_html(site2)

device_label_html2 <- html_nodes(website2,'div.device-name')
device_label2 <- html_text(device_label_html2)

head(c(device_label, device_label2),n=60)

Upvotes: 1

Related Questions