Reputation: 21
I am trying to extract the date information from the following html code using R and xpathSApply:
</td>
</tr>
<tr
data-row-id="1363827503"
class="future "
data-lat-from="-33.946098"
data-lon-from="151.1772"
data-lat-to="33.94252"
data-lon-to="-118.406998"
data-name-from="Sydney Kingsford Smith Airport"
data-name-to="Los Angeles International Airport"
data-date="2015-03-23"
data-flight=""
data-flight-number="VA1"
>
Here is the code in R I have tried:
library(XML)
url<- "http://www.flightradar24.com/data/flights/va1/"
info<- htmlTreeParse(url, useInternalNodes=T)
xpathSApply(info, "//data-date", xmlValue)
This returns: list()
I would like it to return: 2015-03-23
Upvotes: 2
Views: 153
Reputation: 22617
This is the part of the document you are interested in:
<tr
data-row-id="1363827503"
class="future "
data-lat-from="-33.946098"
data-lon-from="151.1772"
data-lat-to="33.94252"
data-lon-to="-118.406998"
data-name-from="Sydney Kingsford Smith Airport"
data-name-to="Los Angeles International Airport"
data-date="2015-03-23"
data-flight=""
data-flight-number="VA1"
>
As you can see, data-date
is not an element, it is an attribute of a tr
element. Use //tr/@data-date
as the XPath expression to retrieve the data-date
attribute.
But note that there are multiple data-date
attributes in this document. To only retrieve a single attribute, you also need a way to identify a specific row, for instance with
//tr[@data-row-id="1363827503"]/@data-date
The ID 1363827503
occurs only once in this document and is therefore a unique identifier for this tr
element.
Upvotes: 2