Reputation: 57
I want to scrape the Sector Weightings Table from the following link:
http://portfolios.morningstar.com/fund/summary?t=SPY®ion=usa&culture=en-US&ownerCountry=USA
The table i want is table 6 in the website's source code. I have the following script written in R:
library(rvest)
turl = 'http://portfolios.morningstar.com/fund/summary?t=SPY'
turlr = read_html(turl)
df6<-html_table(html_nodes(turlr, 'table')[[6]], fill = TRUE)
However when i run the last line of the script i get the following error message
Error in out[j + k, ] : subscript out of bounds
Upvotes: 0
Views: 3650
Reputation: 11955
Since the required table is designed in a different way rvest
is not able to format it into proper table. But using XML
package you can do it quite easily.
library(XML)
library(dplyr)
#read required table
turl = 'http://portfolios.morningstar.com/fund/summary?t=SPY'
temp_table <- readHTMLTable(turl)[[6]]
#process table to readable format
final_table <- temp_table %>%
select(V2, V3, V4, V5) %>%
na.omit() %>%
`colnames<-` (c("","% Stocks","Benchmark","Category Avg")) %>%
`rownames<-` (seq_len(nrow(.)))
final_table
Output is:
% Stocks Benchmark Category Avg
1 Cyclical
2 Basic Materials 2.79 3.16 3.22
3 Consumer Cyclical 11.06 11.42 11.15
4 Financial Services 16.39 16.50 17.22
5 Real Estate 2.24 3.18 2.00
6 Sensitive
7 Communication Services 3.56 3.37 3.50
8 Energy 5.83 5.79 5.79
9 Industrials 10.37 10.89 11.70
10 Technology 22.16 21.41 19.72
11 Defensive
12 Consumer Defensive 8.20 7.60 8.56
13 Healthcare 14.24 13.57 14.57
14 Utilities 3.15 3.11 2.59
Hope it helps!
Upvotes: 2