Bruno Mioto
Bruno Mioto

Reputation: 515

Problem scrapping Eschmeyer's Catalog of Fishes with R on Windows

I would like to scrap the results of this website like a normal searching result.

The code I have is the following but it saves a local copy of the html file, and I would like to change it to a function and implement on a package, doing it without saving a copy.

EDIT: It works on Mac OS and Linux. I just wanted a way to do it on Windows because for a package it must work on all three OS.

search_cas_species <- function(species, path = getwd()) {
  
  url <- "https://researcharchive.calacademy.org/research/ichthyology/catalog/fishcatmain.asp"
  page_initial <- httr::GET(url)
  content_initial <- httr::content(page_initial)
  
  POST_safe <- purrr::safely(httr::POST)
  
  data_cas_species <- list(
    "tbl" = "Species",
    "contains" = species,
    "Submit" = "Search"
  )
  
  if(!dir.exists(path)) dir.create(path)
  
  species_clean <- stringr::str_replace_all(species, '[:blank:]', '_')
  html_name <- paste0(species_clean, ".html")
  html_path <- file.path(path, html_name)
  
  search_page <- POST_safe(
    url = url,
    body = data_cas_species,
    encode = "form",
    write_disk(html_path, overwrite = TRUE)
  )
  
  return(html_name)
}

respostas <- search_cas_species("Cichla")

respostas %>%
  rvest::read_html() %>%
  xml2::xml_find_all(".//p[@class='result']") %>%
  `[`(-1) %>%
  `[`(c(FALSE, TRUE))

I have already tested the following, but it gives me an error.

library(dplyr)

url <- "https://researcharchive.calacademy.org/research/ichthyology/catalog/fishcatmain.asp"

page_initial <- httr::GET(url)

content_initial <- httr::content(page_initial)
#> No encoding supplied: defaulting to UTF-8.

data_cas_species <- list(
  "tbl" = "Species",
  "contains" = "Cichla",
  "Submit" = "Search"
)

search_page <- httr::POST(
  url = url,
  body = data_cas_species,
  encode = "form"
  )
#> Error in curl::curl_fetch_memory(url, handle = handle): Failure when receiving data from the peer

Created on 2022-07-12 by the reprex package (v2.0.1)

sessioninfo::platform_info()
#>  setting  value
#>  version  R version 4.1.1 (2021-08-10)
#>  os       Windows 10 x64 (build 19044)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  Portuguese_Brazil.1252
#>  ctype    Portuguese_Brazil.1252
#>  tz       America/Sao_Paulo
#>  date     2022-07-12
#>  pandoc   2.14.0.3 @ C:/Program Files/RStudio/bin/pandoc/ (via rmarkdown)

Upvotes: 2

Views: 222

Answers (1)

zhang zhixin
zhang zhixin

Reputation: 11

Please check the R package rFishTaxa, which can meet your requirements.

devtools::install_github("Otoliths/rFishTaxa", build_vignettes = TRUE)

library("rFishTaxa")

browseVignettes('rFishTaxa')

Upvotes: 1

Related Questions