Reputation: 21
I'm attempting to web scrape a web page (https://aviation-safety.net/database/dblist.php?Year=1986&lang=&page=1) in R using the following code:
install.packages("rvest")
library(rvest)
aviationurl = "https://aviation-safety.net/database/dblist.php?Year=1986"
webpage = read_html(aviationurl)
# define variables of interest
variables = c("Date","Type","Registration","Operator","Fat","Location","Flag","Picture","Category")
# create sequence of numbers (as CSS for each variable contains a number)
colnums = seq(1,length(variables))
# write commands for pulling each variable into an R dataframe and executing them
eval(parse(paste(variables," = as.data.frame(html_text(html_nodes(webpage,'td:nth-child(",colnums,")')))",sep="")))
# create final table with all variables
df = cbind(parse(variables))
However, after the eval command, I get the following error message:
Error in file(filename, "r") : invalid 'description' argument In addition: Warning message: In if (file == "") { : the condition has length > 1 and only the first element will be used
If I use the paste command without eval(parse()), and manually C+P the resulting strings, they work fine. So why won't R evaluate them properly?
Open to alternative suggestions, though would like to know why eval isn't working and if there's anything I can do to make this work.
Thanks!
Josh
Upvotes: 2
Views: 431
Reputation: 186
For completeness: code to scrape the three pages with data on the website and bind it together in one data frame.
library(rvest)
url <- "https://aviation-safety.net/database/dblist.php?Year=1986&. lang=&page="
urls <- paste0(url, 1:3)
scr <- function(url) {
read_html(url) %>%
html_nodes("table") %>%
html_table() %>%
as.data.frame()
}
df <- do.call(rbind, lapply(urls, scr))
Upvotes: 0
Reputation: 1506
I think this solves your problem.
library(rvest)
aviationurl = "https://aviation-safety.net/database/dblist.php?
Year=1986"
webpage = read_html(aviationurl)
table <- as.data.frame(html_table(html_nodes(webpage, "table")))
head(table)
date type registration
1 03-JAN-1986 Antonov An-2T CCCP-06101
2 13-JAN-1986 BN-2A-6 Islander C-GTPB
3 15-JAN-1986 Dassault Falcon 10 F-GBTC
4 15-JAN-1986 Boeing 737-2A8 Advanced VT-EGD
5 16-JAN-1986 Antonov An-2R SP-WON
6 18-JAN-1986 SE-210 Caravelle VI-N HC-BAE
operator fat. location
1 Tselinny gorno-khimicheski kombinat 0 near Shantobe
2 Borealis Exploration 0 Caribou Horn...
3 Air BG 2 near Vatry/Châlon...
4 Indian Airlines 0 Tiruchirappa...
5 ZUA NA Un-Sara
6 SAETA, op.for Aerovias Guatemala 94 near Flores-Santa...
Var.7 pic cat
1 NA NA A1
2 NA NA A2
3 NA NA A1
4 NA NA A2
5 NA NA A1
6 NA NA A1
other method
library(rvest)
aviationurl = "https://aviation-safety.net/database/dblist.php?
Year=1986"
webpage = read_html(aviationurl)
# define variables of interest
variables =c("Date","Type","Registration","Operator","Fat","Location","Flag","Picture","Category")
# create sequence of numbers (as CSS for each variable contains a number)
colnums = seq(1,length(variables))
library(dplyr)
table <- list()
for(i in 1:length(colnums)){
table[[i]] <- as.data.frame(html_text(html_nodes(webpage, paste0("td:nth-child(",colnums[i],")"))))
}
table <- bind_cols(table)
names(table) <- variables
head(table)
Date Type Registration
1 03-JAN-1986 Antonov An-2T CCCP-06101
2 13-JAN-1986 BN-2A-6 Islander C-GTPB
3 15-JAN-1986 Dassault Falcon 10 F-GBTC
4 15-JAN-1986 Boeing 737-2A8 Advanced VT-EGD
5 16-JAN-1986 Antonov An-2R SP-WON
6 18-JAN-1986 SE-210 Caravelle VI-N HC-BAE
Operator Fat Location Flag
1 Tselinny gorno-khimicheski kombinat 0 near Shantobe
2 Borealis Exploration 0 Caribou Horn...
3 Air BG 2 near Vatry/Châlon...
4 Indian Airlines 0 Tiruchirappa...
5 ZUA Un-Sara
6 SAETA, op.for Aerovias Guatemala 94 near Flores-Santa...
Picture Category
1 A1
2 A2
3 A1
4 A2
5 A1
6 A1
Upvotes: 2