Reputation: 38155
Here's my code for reading the tables but the tables which are read are having a NULL name. Is there a better method for finding the land area of each state in square miles without the commas in the numbers? I had the idea of extracting the table and going to the second table and converting it to data.frame but now that they have NULL names I am not sure what should I do or if there's a better method
require("XML")
url="http://simple.wikipedia.org/wiki/List_of_U.S._states_by_area"
wiki_page=readLines(url)
length(wiki_page)
tables=readHTMLTable(url)
Here's a sample output:
> tables
$`NULL`
Rank State km² miles²
1 1 Alaska 1,717,854 663,267
2 2 Texas 696,621 268,581
3 3 California 423,970 163,696
4 4 Montana 380,838 147,042
5 5 New Mexico 314,915 121,589
6 6 Arizona 295,254 113,998
7 7 Nevada 286,351 110,561
8 8 Colorado 269,601 104,094
9 9 Oregon 254,805 98,381
....
Upvotes: 2
Views: 858
Reputation: 121568
You should read the names and assign them to tables:
library(XML)
require("XML")
url="http://simple.wikipedia.org/wiki/List_of_U.S._states_by_area"
doc <- htmlParse(url)
nn <- xpathSApply(doc,'//*[@class="mw-headline"]',xmlValue)[-4]
tabs <- readHTMLTable(url)
names(tabs) <- nn
Check the result :
str(tabs,max=1)
# List of 3
# $ Total area:'data.frame': 50 obs. of 4 variables:
# $ Land area :'data.frame': 50 obs. of 4 variables:
# $ Water area:'data.frame': 50 obs. of 5 variables:
convert_num <-
function(x)as.numeric(gsub(',','',x))
lapply(tabs,function(y){
y[,-c(1,2)] <- sapply(y[,-c(1,2)],convert_num)
y
})
Upvotes: 1