Reputation: 11
I am facing difficulty while loading the rvest
/XML
packages in to R and I am unable to process the code.
How exactly should I use rvest
for web scraping?
How to read a table from web page "https://www.forbes.com/powerful-brands/list/" ?
library(rvest)
forbs <- readHTMLTable("https://www.forbes.com/powerful-brands/list/")
head(forbs)
View(forbs)
it is showing error like
forbs1<-html_text("#list_table") Error in UseMethod("xml_text") : no applicable method for 'xml_text' applied to an object of class "character"
Upvotes: 1
Views: 125
Reputation: 2223
Here is an approach that can be considered :
library(rvest)
url <- "https://www.forbes.com/the-worlds-most-valuable-brands/#4052ca71119c"
read_html(html_Content) %>% html_table()
[[1]]
# A tibble: 51 x 6
Rank Brand `Brand Value` `1-Yr Value Change` `Brand Revenue` Industry
<int> <chr> <chr> <chr> <chr> <chr>
1 NA "" "" "" "" ""
2 1 "Apple" "$241.2 B" "17%" "$260.2 B" "Technology"
3 2 "Google" "$207.5 B" "24%" "$145.6 B" "Technology"
4 3 "Microsoft" "$162.9 B" "30%" "$125.8 B" "Technology"
5 4 "Amazon" "$135.4 B" "40%" "$260.5 B" "Technology"
6 5 "Facebook" "$70.3 B" "-21%" "$49.7 B" "Technology"
7 6 "Coca-Cola" "$64.4 B" "9%" "$25.2 B" "Beverages"
8 7 "Disney" "$61.3 B" "18%" "$38.7 B" "Leisure"
9 8 "Samsung" "$50.4 B" "-5%" "$209.5 B" "Technology"
10 9 "Louis Vuitton" "$47.2 B" "20%" "$15 B" "Luxury"
# ... with 41 more rows
# i Use `print(n = ...)` to see more rows
Here is another approach that can be considered :
library(RDCOMClient)
library(rvest)
url <- "https://www.forbes.com/the-worlds-most-valuable-brands/#4052ca71119c"
IEApp <- COMCreate("InternetExplorer.Application")
IEApp[['Visible']] <- TRUE
IEApp$Navigate(url)
Sys.sleep(15)
doc <- IEApp$Document()
html_Content <- doc$documentElement()$innerHtml()
read_html(html_Content) %>% html_table()
[[1]]
# A tibble: 51 x 6
Rank Brand `Brand Value` `1-Yr Value Change` `Brand Revenue` Industry
<int> <chr> <chr> <chr> <chr> <chr>
1 NA "" "" "" "" ""
2 1 "Apple" "$241.2 B" "17%" "$260.2 B" "Technology"
3 2 "Google" "$207.5 B" "24%" "$145.6 B" "Technology"
4 3 "Microsoft" "$162.9 B" "30%" "$125.8 B" "Technology"
5 4 "Amazon" "$135.4 B" "40%" "$260.5 B" "Technology"
6 5 "Facebook" "$70.3 B" "-21%" "$49.7 B" "Technology"
7 6 "Coca-Cola" "$64.4 B" "9%" "$25.2 B" "Beverages"
8 7 "Disney" "$61.3 B" "18%" "$38.7 B" "Leisure"
9 8 "Samsung" "$50.4 B" "-5%" "$209.5 B" "Technology"
10 9 "Louis Vuitton" "$47.2 B" "20%" "$15 B" "Luxury"
# ... with 41 more rows
# i Use `print(n = ...)` to see more rows
Here is another approach that can be considered :
library(RSelenium)
library(rvest)
url <- "https://www.forbes.com/the-worlds-most-valuable-brands/#4052ca71119c"
shell('docker run -d -p 4445:4444 selenium/standalone-firefox')
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4445L, browserName = "firefox")
remDr$open()
remDr$navigate(url)
Sys.sleep(15)
remDr$getPageSource()[[1]] %>% read_html() %>% html_table()
[[1]]
# A tibble: 51 x 6
Rank Brand `Brand Value` `1-Yr Value Change` `Brand Revenue` Industry
<int> <chr> <chr> <chr> <chr> <chr>
1 NA "" "" "" "" ""
2 1 "Apple" "$241.2 B" "17%" "$260.2 B" "Technology"
3 2 "Google" "$207.5 B" "24%" "$145.6 B" "Technology"
4 3 "Microsoft" "$162.9 B" "30%" "$125.8 B" "Technology"
5 4 "Amazon" "$135.4 B" "40%" "$260.5 B" "Technology"
6 5 "Facebook" "$70.3 B" "-21%" "$49.7 B" "Technology"
7 6 "Coca-Cola" "$64.4 B" "9%" "$25.2 B" "Beverages"
8 7 "Disney" "$61.3 B" "18%" "$38.7 B" "Leisure"
9 8 "Samsung" "$50.4 B" "-5%" "$209.5 B" "Technology"
10 9 "Louis Vuitton" "$47.2 B" "20%" "$15 B" "Luxury"
# ... with 41 more rows
# i Use `print(n = ...)` to see more rows
Upvotes: 0