Reputation: 63
I am a novice R user and definitely new with the xml format, so forgive me if there is an obvious answer to this question.
I am trying to create a data frame with specific objects from an xml file, and have two questions.
When I read the xml file content from a URL into R (I use htmlTreeParse), it appears to be one long string instead of the usual format I see with xml files. I tried using other URLs and didn't have that problem. Does this have to do with the series of "??@@@" in the middle of the xml content? (URL: http://opentrip.atlantaregion.com/otp-rest-servlet/ws/plan?&fromPlace=33.87725673930016%2C-84.46014404296875&toPlace=33.74946419232578%2C-84.38873291015625&time=1%3A13pm&date=03-21-2014&mode=TRANSIT%2CWALK&maxWalkDistance=750&arriveBy=false&showIntermediateStops=false&itinIndex=0).
I'm a little lost on how to assign the xml content to a data frame, grab certain parts of it and assign to different variables.
I've attached my R code so far in case it's helpful.
Thank you, and I appreciate any insight you all might have! Again, my apologies if the answer is very obvious.
MY R CODE:
xml.url <- "http://opentrip.atlantaregion.com/otp-rest-servlet/ws/plan?&fromPlace=33.87725673930016%2C-84.46014404296875&toPlace=33.74946419232578%2C-84.38873291015625&time=1%3A13pm&date=03-21-2014&mode=TRANSIT%2CWALK&maxWalkDistance=750&arriveBy=false&showIntermediateStops=false&itinIndex=0"
xmlfile <- htmlTreeParse(xml.url)
Upvotes: 2
Views: 206
Reputation: 30425
The website tailors its content depending on whom it thinks is asking.
You need to ask it to send you xml content. Also you may need to give it a user agent. This can be done with RCurl
library(XML)
library(RCurl)
xml.url <- "http://opentrip.atlantaregion.com/otp-rest-servlet/ws/plan?&fromPlace=33.87725673930016%2C-84.46014404296875&toPlace=33.74946419232578%2C-84.38873291015625&time=1%3A13pm&date=03-21-2014&mode=TRANSIT%2CWALK&maxWalkDistance=750&arriveBy=false&showIntermediateStops=false&itinIndex=0"
myAgent <- "Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Firefox/31.0"
myAccept <- "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
library(XML)
library(RCurl)
xData <- getURL(xml.url, useragent = myAgent, encoding = "UTF-8"
,httpheader = c(Accept = myAccept))
xmlfile <- htmlParse(xData) #, encoding = "UTF8")
alternatively if you dont ask it for XML
it will return you JSON
and you can parse it using RJSONIO
or something similar:
library(RJSONIO)
jData <- fromJSON(xml.url)
> names(jData)
[1] "requestParameters" "plan" "error" "debug"
> jData$requestParameters
date mode
"03-21-2014" "TRANSIT,WALK"
arriveBy showIntermediateStops
"false" "false"
fromPlace itinIndex
"33.87725673930016,-84.46014404296875" "0"
toPlace time
"33.74946419232578,-84.38873291015625" "1:13pm"
maxWalkDistance
"750"
Upvotes: 3