Reputation: 1
Brand new to R, so I'll try my best to explain this. I've been playing with data scraping using the "rvest" package. In this example, I'm scraping US state populations from a table on Wikipedia. The code I used is:
library(rvest)
statepop = read_html("https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_population")
forecasthtml = html_nodes(statepop, "td")
forecasttext = html_text(forecasthtml)
forecasttext
The resulting output was as follows:
[2] "7000100000000000000♠1"
[3] " California"
[4] "39,250,017"
[5] "37,254,503"
[6] "7001530000000000000♠53"
[7] "738,581"
[8] "702,905"
[9] "12.15%"
[10] "7000200000000000000♠2"
[11] "7000200000000000000♠2"
[12] " Texas"
[13] "27,862,596"
[14] "25,146,105"
[15] "7001360000000000000♠36"
[16] "763,031"
[17] "698,487"
[18] "8.62%"
How can I turn these strings of text into a table that is set up similar to the way it is presented on the original Wikipedia page (with columns, rows, etc)?
Upvotes: 0
Views: 547
Reputation: 24139
Try using rvest's html_table function.
Note there are five tables on the page thus you will need to specify which table you would like to parse.
library(rvest)
statepop = read_html("https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_population")
#find all of the tables on the page
tables<-html_nodes(statepop, "table")
#convert the first table into a dataframe
table1<-html_table(tables[1])
Upvotes: 2