Reputation: 7830
I'm looking for a simple and efficient way to transform XML datas as a data.frame (but not all the elements though).
I have this file : http://www-sop.inria.fr/members/Philippe.Poulard/projet/2013/entries_hotels.xml
I used xpathSApply
, but that's bad because it doesn't conserve the null elements.
In the file some latitudes are empty, but with xpathSApply
I can't know which hotels have an empty latitude element because they are ignored.
I found the xmlToList
function, and it's nice with XML because it's prety the same structure (it avoid to have many NULL values in a data frame).
But now I have 2 problems :
If I want to create a data frame from this list with an exhausting list of elements and keep the NULLs elements, how can i do ? I did this but NULLs aren't kept in my vector :
library(XML)
hotels <- "http://www-sop.inria.fr/members/Philippe.Poulard/projet/2013/entries_hotels.xml"
list <- xmlToList(hotels)
latitudes.hotels <- c()
for(element in list) {latitudes.hotels <- c(latitudes.hotels, element$latitude)}
And my second problem is that if I want to work directly with my list, the problem is that all the names are the sames : "entry".
Then I wonder how I can acces to the entry with the Id equals to x for example, which(list$entry$ID == x)
.
I can do it with the same type of vector than above
ids.hotels <- c()
for(element in list) {ids.hotels <- c(ids.hotels, element$ID)}
list[[which(ids.hotels == x)]]
But I think there is a better way to do it, and it's wrong if one ID element is empty in my XML file.
Thank you for any help
Upvotes: 0
Views: 477
Reputation: 25844
I'm not familiar with the XML package, however you can extract elements using base functions and can retain the missing longitude/latitude.
lst <- xmlToList(hotels)
ll <- lapply(1:150 , function(z)
c(id=lst[[z]][['ID']],name=lst[[z]][['name_fr']],
lat=lst[[z]][['latitude']],long=lst[[z]][['longitude']]))
library(plyr)
df <- rbind.fill(
lapply(ll,function(y){as.data.frame(t(y),stringsAsFactors=FALSE)}))
Got the rbind.fill from here: do.call(rbind, list) for uneven number of column
Also whereas all the names of the list are 'entry' eg using names(lst[1]) for the first, you can get the names by names(lst[[1]])
Upvotes: 1