Reputation: 195
I need to write a file inside a lapply function. I'm scraping a large list of webpages and I would like to save the output every 100th or so. I use the following code
from = seq(1,100, 10)
aa <- length(url)
func1 = function(url){
out <- tryCatch(
{
aa <<- aa -1
print(aa)
doc = htmlParse(url)
address= as.data.frame(xpathSApply(doc,'//div[@class="panel-body"]', xmlValue, encoding="UTF-8"))
page = cbind(address,url)
if (aa %in% from){
pg = suppressMessages(melt(cc))
write.csv(pg,paste("bcc_",aa,".csv"))
}
}
cc = lapply(url, func1)
However, when I do this I get an error saying object "cc" is not found. I know this can be done using a for loop. But is there a way to accomplish this task using the apply function.
Upvotes: 1
Views: 1507
Reputation: 396
Build cc
as an new environment object outside of your lapply
.
e <- new.env()
e$cc <- list()
a <- letters[]
b <- 1:26
# Example lapply
out <- lapply(a, function(a,b){
e$cc[[a]] <- b
if(length(e$cc)%%10==0) print(length(e$cc))
b # Giving an output to out aswell
},b
)
# [1] 10
# [1] 20
# Showing first elements of outputs
# > e$cc
#$a
# [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
#[26] 26
# > out
#[[1]]
# [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
#[26] 26
Such method will allow you to build cc
inside a new R environment which can then be enumerated mid-apply and will output your classical output. Not the most elegant solution though.
n.b. This solution will need to be modified to your code. Also reset e$cc with e$cc <- list()
if need be, as after running once it will only replace elements.
ALTERNATIVELY: (UNTESTED!) You could try adapt your script into something like this.
func1 <- function(url){
out <- tryCatch(
{
doc <- htmlParse(url)
address <- as.data.frame(xpathSApply(
doc,'//div[@class="panel-body"]', xmlValue, encoding="UTF-8")
)
page <- cbind(address,url)
}
}
wrapfun <- function(urls){
e <- new.env()
e$cc <- list()
lapply(urls, function(x){
e$cc[[x]] <- func1(x)
if(length(e$cc)%%10==0){ # Change the %%y to how often you want to save e.g length(e$cc)%%100==0 would be every 100.
pg <- suppressMessages(melt(e$cc))
write.csv(pg,paste("bcc_",length(e$cc),".csv"))
}
})
return(e$cc)
}
Upvotes: 1