Talha Naushad
Talha Naushad

Reputation: 57

Scraping table using html_table in R

I want to scrape the Sector Weightings Table from the following link:

http://portfolios.morningstar.com/fund/summary?t=SPY&region=usa&culture=en-US&ownerCountry=USA

The table i want is table 6 in the website's source code. I have the following script written in R:

 library(rvest)
 turl = 'http://portfolios.morningstar.com/fund/summary?t=SPY'
 turlr = read_html(turl) 
 df6<-html_table(html_nodes(turlr, 'table')[[6]], fill = TRUE) 

However when i run the last line of the script i get the following error message

Error in out[j + k, ] : subscript out of bounds

Upvotes: 0

Views: 3650

Answers (1)

Prem
Prem

Reputation: 11955

Since the required table is designed in a different way rvest is not able to format it into proper table. But using XML package you can do it quite easily.

library(XML)
library(dplyr)

#read required table
turl = 'http://portfolios.morningstar.com/fund/summary?t=SPY'
temp_table <- readHTMLTable(turl)[[6]]

#process table to readable format
final_table <- temp_table %>%
  select(V2, V3, V4, V5) %>%
  na.omit() %>%
  `colnames<-` (c("","% Stocks","Benchmark","Category Avg")) %>%
  `rownames<-` (seq_len(nrow(.)))
final_table

Output is:

                          % Stocks Benchmark Category Avg
1                Cyclical                                
2         Basic Materials     2.79      3.16         3.22
3       Consumer Cyclical    11.06     11.42        11.15
4      Financial Services    16.39     16.50        17.22
5             Real Estate     2.24      3.18         2.00
6               Sensitive                                
7  Communication Services     3.56      3.37         3.50
8                  Energy     5.83      5.79         5.79
9             Industrials    10.37     10.89        11.70
10             Technology    22.16     21.41        19.72
11              Defensive                                
12     Consumer Defensive     8.20      7.60         8.56
13             Healthcare    14.24     13.57        14.57
14              Utilities     3.15      3.11         2.59

Hope it helps!

Upvotes: 2

Related Questions