Carolingian
Carolingian

Reputation: 1

How to turn rvest output into table

Brand new to R, so I'll try my best to explain this. I've been playing with data scraping using the "rvest" package. In this example, I'm scraping US state populations from a table on Wikipedia. The code I used is:

library(rvest)
statepop = read_html("https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_population")
forecasthtml = html_nodes(statepop, "td")
forecasttext = html_text(forecasthtml)
forecasttext

The resulting output was as follows:

[2] "7000100000000000000♠1"                                        
[3] " California"                                                  
[4] "39,250,017"                                                   
[5] "37,254,503"                                                   
[6] "7001530000000000000♠53"                                       
[7] "738,581"                                                      
[8] "702,905"                                                      
[9] "12.15%"                                                       
[10] "7000200000000000000♠2"                                        
[11] "7000200000000000000♠2"                                        
[12] " Texas"                                                       
[13] "27,862,596"                                                   
[14] "25,146,105"                                                   
[15] "7001360000000000000♠36"                                       
[16] "763,031"                                                      
[17] "698,487"                                                      
[18] "8.62%"

How can I turn these strings of text into a table that is set up similar to the way it is presented on the original Wikipedia page (with columns, rows, etc)?

Upvotes: 0

Views: 547

Answers (1)

Dave2e
Dave2e

Reputation: 24139

Try using rvest's html_table function.
Note there are five tables on the page thus you will need to specify which table you would like to parse.

library(rvest)

statepop = read_html("https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_population")
#find all of the tables on the page
tables<-html_nodes(statepop, "table") 
#convert the first table into a dataframe
table1<-html_table(tables[1])

Upvotes: 2

Related Questions