fede_luppi
fede_luppi

Reputation: 1171

Grab HTML table using XML

I am trying to read an html table using the package XML, but even though it looks easy, I haven’t managed to do it. I tried everything, but the names of the columns are always fixed by R as V1, V2, V3,…

This is the code:

require(XML)

tbl <- readHTMLTable("http://facedata.ornl.gov/ornl/npp_98-08.html”,
header = c("year","ring","CO2", "stem","root","leaf","fine root", "NPP"), 
skip.rows=c(1,2),colClasses=c(rep("factor",3),rep("numeric",5)))

Many thanks for your help

Upvotes: 1

Views: 104

Answers (1)

jdharrison
jdharrison

Reputation: 30465

The first row of the table is causing trouble. It maybe easiest to remove it:

library(XML)
appURL <- "http://facedata.ornl.gov/ornl/npp_98-08.html"
doc <- htmlParse(appURL)
removeNodes(doc["//table/tr[1]"]) # remove the first row with the troublesome header
myTable <- readHTMLTable(doc, which = 1)

> head(myTable)
  Year Plot  CO2 Stem Coarse Root Leaf Fine Root Total NPP
1 1998    1 elev 1540         127  362       168      2197
2 1998    2 elev 1487         139  418       175      2219
3 1998    3  amb 1085         112  333       231      1762
4 1998    4  amb 1204         113  368       185      1870
5 1998    5  amb 1136         109  382        56      1683
6 1999    1 elev 1218          98  475       295      2086

Upvotes: 1

Related Questions