YasserKhalil
YasserKhalil

Reputation: 9568

Object not found in R language

I am newbie at using R and here's my attempt to play a round a code to scrape quotes from multiple pages

# Load Libraries
library(rvest)      # To Scrape
library(tidyverse)  # To Manipulate Data

# Scrape Multiple Pages
for (i in 1:4){
  site_to_scrape <- read_html(paste0("http://quotes.toscrape.com/page/",i))
  temp <- site_to_scrape html_nodes(".text") html_text()
  content <- append(content, temp)
}

#Export Results To CSV File
write.csv(content, file = "content.csv", row.names = FALSE)

I have encountered an error Object not found as for content variable. How can I overcome this error and set the object so as to be reusable in the append line?

Upvotes: 0

Views: 115

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 389175

Growing vector in a loop is very inefficient if you are scraping many pages. Instead what you should do is initialise a list with specific length which you know beforehand.

library(rvest)
n <- 4
content = vector('list', n)

# Scrape Multiple Pages
for (i in 1:n){
  site_to_scrape <- read_html(paste0("http://quotes.toscrape.com/page/",i))
  content[[i]] <- site_to_scrape %>%
    html_nodes(".text") %>%
    html_text()
}
write.csv(unlist(content), file = "content.csv", row.names = FALSE)

Another option without initialising is to use sapply/lapply :

all_urls <- paste0("http://quotes.toscrape.com/page/",1:4)
content <- unlist(lapply(all_urls, function(x) 
               x %>% read_html %>%  html_nodes(".text") %>% html_text()))

Upvotes: 2

YasserKhalil
YasserKhalil

Reputation: 9568

I have searched and found the way to assign empty object before the loop content = c()

# Load Libraries
library(rvest)      # To Scrape
library(tidyverse)  # To Manipulate Data

content = c()

# Scrape Multiple Pages
for (i in 1:4){
  site_to_scrape <- read_html(paste0("http://quotes.toscrape.com/page/",i))
  temp <- site_to_scrape %>%
    html_nodes(".text") %>%
    html_text()
  content <- append(content, temp)
}

#Export Results To CSV File
write.csv(content, file = "content.csv", row.names = FALSE)

Upvotes: 0

Related Questions