Julien Navarre
Julien Navarre

Reputation: 7830

Transform XML into a data frame

I'm looking for a simple and efficient way to transform XML datas as a data.frame (but not all the elements though).

I have this file : http://www-sop.inria.fr/members/Philippe.Poulard/projet/2013/entries_hotels.xml

I used xpathSApply, but that's bad because it doesn't conserve the null elements. In the file some latitudes are empty, but with xpathSApply I can't know which hotels have an empty latitude element because they are ignored.

I found the xmlToList function, and it's nice with XML because it's prety the same structure (it avoid to have many NULL values in a data frame).

But now I have 2 problems :

If I want to create a data frame from this list with an exhausting list of elements and keep the NULLs elements, how can i do ? I did this but NULLs aren't kept in my vector :

library(XML)
hotels <- "http://www-sop.inria.fr/members/Philippe.Poulard/projet/2013/entries_hotels.xml"
list <- xmlToList(hotels)
latitudes.hotels <- c()
for(element in list) {latitudes.hotels <- c(latitudes.hotels, element$latitude)}

And my second problem is that if I want to work directly with my list, the problem is that all the names are the sames : "entry".
Then I wonder how I can acces to the entry with the Id equals to x for example, which(list$entry$ID == x).
I can do it with the same type of vector than above

ids.hotels <- c()
for(element in list) {ids.hotels <- c(ids.hotels, element$ID)}
list[[which(ids.hotels == x)]]

But I think there is a better way to do it, and it's wrong if one ID element is empty in my XML file.

Thank you for any help

Upvotes: 0

Views: 477

Answers (1)

user20650
user20650

Reputation: 25844

I'm not familiar with the XML package, however you can extract elements using base functions and can retain the missing longitude/latitude.

lst <- xmlToList(hotels)

ll <- lapply(1:150 , function(z) 
                c(id=lst[[z]][['ID']],name=lst[[z]][['name_fr']],
                lat=lst[[z]][['latitude']],long=lst[[z]][['longitude']]))

library(plyr)
df <- rbind.fill(
            lapply(ll,function(y){as.data.frame(t(y),stringsAsFactors=FALSE)}))

Got the rbind.fill from here: do.call(rbind, list) for uneven number of column

Also whereas all the names of the list are 'entry' eg using names(lst[1]) for the first, you can get the names by names(lst[[1]])

Upvotes: 1

Related Questions