emorystudent
emorystudent

Reputation: 63

Problems in parsing xml to R data frame?

I am a novice R user and definitely new with the xml format, so forgive me if there is an obvious answer to this question.

I am trying to create a data frame with specific objects from an xml file, and have two questions.

  1. When I read the xml file content from a URL into R (I use htmlTreeParse), it appears to be one long string instead of the usual format I see with xml files. I tried using other URLs and didn't have that problem. Does this have to do with the series of "??@@@" in the middle of the xml content? (URL: http://opentrip.atlantaregion.com/otp-rest-servlet/ws/plan?&fromPlace=33.87725673930016%2C-84.46014404296875&toPlace=33.74946419232578%2C-84.38873291015625&time=1%3A13pm&date=03-21-2014&mode=TRANSIT%2CWALK&maxWalkDistance=750&arriveBy=false&showIntermediateStops=false&itinIndex=0).

  2. I'm a little lost on how to assign the xml content to a data frame, grab certain parts of it and assign to different variables.

I've attached my R code so far in case it's helpful.

Thank you, and I appreciate any insight you all might have! Again, my apologies if the answer is very obvious.

MY R CODE:

xml.url <- "http://opentrip.atlantaregion.com/otp-rest-servlet/ws/plan?&fromPlace=33.87725673930016%2C-84.46014404296875&toPlace=33.74946419232578%2C-84.38873291015625&time=1%3A13pm&date=03-21-2014&mode=TRANSIT%2CWALK&maxWalkDistance=750&arriveBy=false&showIntermediateStops=false&itinIndex=0"

xmlfile <- htmlTreeParse(xml.url)

Upvotes: 2

Views: 206

Answers (1)

jdharrison
jdharrison

Reputation: 30425

The website tailors its content depending on whom it thinks is asking. You need to ask it to send you xml content. Also you may need to give it a user agent. This can be done with RCurl

library(XML)
library(RCurl)
xml.url <- "http://opentrip.atlantaregion.com/otp-rest-servlet/ws/plan?&fromPlace=33.87725673930016%2C-84.46014404296875&toPlace=33.74946419232578%2C-84.38873291015625&time=1%3A13pm&date=03-21-2014&mode=TRANSIT%2CWALK&maxWalkDistance=750&arriveBy=false&showIntermediateStops=false&itinIndex=0"
myAgent <- "Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Firefox/31.0"
myAccept <- "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
library(XML)
library(RCurl)
xData <- getURL(xml.url, useragent = myAgent, encoding = "UTF-8"
                ,httpheader = c(Accept = myAccept))
xmlfile <- htmlParse(xData) #, encoding = "UTF8")

alternatively if you dont ask it for XML it will return you JSON and you can parse it using RJSONIO or something similar:

library(RJSONIO)
jData <- fromJSON(xml.url)
> names(jData)
[1] "requestParameters" "plan"              "error"             "debug"            
> jData$requestParameters
date                                   mode 
"03-21-2014"                         "TRANSIT,WALK" 
arriveBy                  showIntermediateStops 
"false"                                "false" 
fromPlace                              itinIndex 
"33.87725673930016,-84.46014404296875"                                    "0" 
toPlace                                   time 
"33.74946419232578,-84.38873291015625"                               "1:13pm" 
maxWalkDistance 
"750" 

Upvotes: 3

Related Questions