Zhaochen He
Zhaochen He

Reputation: 660

R: readHTMLTable returns empty list

I'm trying to import the data on this website, but it simply isn't working. It's a simple HTML table, and so should be amenable to the readHTMLTable function in XML. Please advise.

require(XML)
url = 'https://www.archives.gov/federal-register/electoral-college/allocation.html'
table = readHTMLTable(url,header = T,stringsAsFactors=F)

Upvotes: 0

Views: 508

Answers (2)

Swapnil
Swapnil

Reputation: 164

Here is a solution using rvest package

library(tidyverse)
library(rvest)

read_html("https://www.archives.gov/federal-register/electoral-college/allocation.html") %>% # read the html page
  html_nodes("table") %>% # extract nodes which contain a table
  .[5] %>% # select the node which contains the relevant table
  html_table(trim = T) # extract the table

Upvotes: 1

Maurits Evers
Maurits Evers

Reputation: 50718

You can do the following

library(XML)
library(RCurl)

# Read HTML library    
URL <- "https://www.archives.gov/federal-register/electoral-college/allocation.html"
lst <- readHTMLTable(getURL(URL))

# Remove NULL elements in lst
lst <- Filter(Negate(is.null), lst)

Upon inspection we see that the main table is element 4 in lst

df <- lst[[4]]
df
#                  State Number of Electoral Votes
#1               Alabama                         9
#2                Alaska                         3
#3               Arizona                        11
#4              Arkansas                         6
#5            California                        55
#6              Colorado                         9
#7           Connecticut                         7
#8              Delaware                         3
#9  District of Columbia                         3
#10              Florida                        29
#11              Georgia                        16
#12               Hawaii                         4
#13                Idaho                         4
#14             Illinois                        20
#15              Indiana                        11
#16                 Iowa                         6
#17               Kansas                         6
#18             Kentucky                         8
#19            Louisiana                         8
#20                Maine                         4
#21             Maryland                        10
#22        Massachusetts                        11
#23             Michigan                        16
#24            Minnesota                        10
#25          Mississippi                         6
#26             Missouri                        10
#27              Montana                         3
#28             Nebraska                         5
#29               Nevada                         6
#30        New Hampshire                         4
#31           New Jersey                        14
#32           New Mexico                         5
#33             New York                        29
#34       North Carolina                        15
#35         North Dakota                         3
#36                 Ohio                        18
#37             Oklahoma                         7
#38               Oregon                         7
#39         Pennsylvania                        20
#40         Rhode Island                         4
#41       South Carolina                         9
#42         South Dakota                         3
#43            Tennessee                        11
#44                Texas                        38
#45                 Utah                         6
#46              Vermont                         3
#47             Virginia                        13
#48           Washington                        12
#49        West Virginia                         5
#50            Wisconsin                        10
#51              Wyoming                         3

The reason why your approach does not work is because url() which gets called when using readHTMLTable with an URL can't download from https. So you need to use RCurl to download the file first.

Upvotes: 0

Related Questions