Reputation: 147
Please note I am very new to web scraping in R and R itself so when explaining a response please be aware of this...
I am trying to web scrape the Date of Stay, the Title of the Review and the Review
This is where I generate the list of URL's I want to use:
library(rvest)
#GENERATING THE URLS
webpage_list <- vector(mode = "list")
#creating empty list
webpage_list
for(n in seq(from=5, to=15, by=5)){
webpage_list[[n]] <- glue::glue("https://www.sampleURL.com#REVIEWS")
}
#droping the empty values
webpage_list[sapply(webpage_list,is.null)] <- NULL
webpage_list
Then convert the list to a character vector and iterate through start identifying the area on the webpage I want scraped
webpage_list2 <- unlist(webpage_list)
class(webpage_list2)
for(i in seq_along(webpage_list2)){
webpage <- read_html(webpage_list2[i])
results <- webpage %>% html_nodes(".oETBfkHU , ._3hDPbqWO")
print(results)
# Building the dataset
records <- vector("character", length = (length(results)))
print(records)
}
Seems to be working as I want (I think) up until this point
for (x in seq_along(results)) {
url <- read_html(webpage_list2[x])
dateOfStay <- str_c(url %>%
html_nodes("._34Xs-BQm") %>%
html_text())
reviewTitle <- str_sub(url %>%
html_nodes(".glasR4aX")%>%
html_text())
review <- str_sub(url %>%
html_nodes(".IRsGHoPm") %>%
html_text())
records[[x]] <- data_frame(dateOfStay = dateOfStay, reviewTitle = reviewTitle, review = review)#, reviewTitle = reviewTitle, review = review
}
#Build DF
DF <- bind_rows(records)
From this I get the below error:
Error in records[[x]] <- data_frame(dateOfStay = dateOfStay, reviewTitle = reviewTitle, : more elements supplied than there are to replace
Any help would be greatly appreciated and also Please note I am very new to web scraping in R and R itself so when explaining a response please be aware of this.
Upvotes: 0
Views: 105
Reputation: 2021
Without scraping we can find your problem. You are trying to put a dataframe inside a character vector. A dataframe isn't a character. So it is the wrong dimensions. You can fix it by making records a list, or be wrapping your dataframe in a list to coerce it to a single item. I recommend making records a list.
records <- vector("character", length = (3))
records[[2]] <- data.frame(test = "A",test2 = "B")
# Error in records[[2]] <- data.frame(test = "A", test2 = "B") :
# more elements supplied than there are to replace
# Option 1:
records <- list(length = (3))
records[[2]] <- data.frame(test = "A",test2 = "B")
records
# $`length`
# [1] 3
#
# [[2]]
# test test2
# 1 A B
# Option 2:
records <- vector("character", length = (3))
records[[2]] <- list(data.frame(test = "A",test2 = "B"))
# records
# [[1]]
# [1] ""
#
# [[2]]
# [[2]][[1]]
# test test2
# 1 A B
#
#
# [[3]]
# [1] ""
Upvotes: 1