Reputation: 4109
I'm trying to parse this XML table, but I'm having trouble counting the number of "var" nodes. My code so far is below. I would like to be able to replace the 16597 with a generalizable value so that I can use this code for other similar tables. I need to do this in R, not in XPATH.
require(RCurl)
require(XML)
url = "http://api.census.gov/data/2000/sf3/variables.xml"
doc = xmlParse(url)
root = xmlRoot(doc)
xml.data = xmlToList(doc)
id = NULL
label = NULL
concept = NULL
for(i in 1:16597){
id[i] = xml.data[[1]][[(i+2)]][["id"]]
label[i] = xml.data[[1]][[(i+2)]][["label"]]
concept[i] = xml.data[[1]][[(i+2)]][["concept"]]
}
scraped.data = data.frame(id, label, concept)
I tried this based off of this question but got 0.
doc <- xmlTreeParse(url)
xpathApply(xmlRoot(doc),path="count(//vars)",xmlValue)
Where is my misunderstanding?
Upvotes: 0
Views: 2118
Reputation: 2225
You can avoid the loop and just "rbind" your list.
y <- ldply(xml.data[[1]], "rbind")
dim(y)
[1] 16599 6
head(y)
.id id label
1 var for Census API FIPS 'for' clause
2 var in Census API FIPS 'in' clause
3 var PCT022034 Total: Not living in an MSA/PMSA in 2000: Different house in 1995: In United States in 1995: In an MSA/PMSA in 1995:
4 var PCT022035 Total: Not living in an MSA/PMSA in 2000: Different house in 1995: In United States in 1995: In an MSA/PMSA in 1995: Central city
5 var PCT022032 Total: Not living in an MSA/PMSA in 2000: Different house in 1995:
6 var PCT022033 Total: Not living in an MSA/PMSA in 2000: Different house in 1995: In United States in 1995:
Upvotes: 1