Reputation: 127
I am practicing my web scraping coding in R and I cannot pass one phase no matter what website I try.
For example,
https://www.thecompleteuniversityguide.co.uk/league-tables/rankings?s=Music
My goal is to extract all 77 schools' name (Oxford to London Metropolitan)
So I tried...
library(rvest)
url_college <- "https://www.thecompleteuniversityguide.co.uk/league-tables/rankings?s=Music"
college <- read_html(url_college)
info <- html_nodes(college, css = '.league-table-institution-name')
info %>% html_nodes('.league-table-institution-name') %>% html_text()
From F12, I could find out that all schools' name is under class '.league-table-institution-name'... and that's why I wrote that in html_nodes...
What have I done wrong?
Upvotes: 0
Views: 210
Reputation: 33782
You appear to be running html_nodes()
twice: first on college
, an xml_document
(which is correct) and then on info
, a character vector, which is not correct.
Try this instead:
url_college %>%
read_html() %>%
html_nodes('.league-table-institution-name') %>%
html_text()
and then you'll need an additional step to clean up the school names; this one was suggested:
%>%
str_replace_all("(^[^a-zA-Z]+)|([^a-zA-Z]+$)", "")
Upvotes: 3