speendo
speendo

Reputation: 13345

Import gpx track using the XML library

I want to analyse a gpx track in R. To import the data, I try to use the XML-package.

I found a tutorial that explained how to import each individual data vector and then combine them to a data frame.

However, in my usecase this does not work, because for some nodes there was no heart rate <gpxtpx:hr> collected, so the length of the vectors would not match.

Therefore I try to import all relevant data at once.

What I manage to do so far is

library(XML)

filename <- "sample.gpx"
download.file("https://owncloud.yeara.net/index.php/s/Io4uOq6sfFuCCdq/download", filename) # downloads a sample file from my server

gpx.raw <- xmlTreeParse(filename, useInternalNodes = TRUE)

rootNode <- xmlRoot(gpx.raw)

print(rootNode) # output seems okay

Now, instead of the rootNode, I'd like to import the content of <trkseg> into a dataframe. It should be designed in the following way:

Can you help me to achieve this?

Upvotes: 0

Views: 2047

Answers (2)

BenG321
BenG321

Reputation: 11

Here is a version similar to the answer from @speendo, but using dplyr and purrr:

library(XML)
library(dplyr)
library(purrr)

filename <- "Downloads/activity(1).gpx"

gpx <- filename %>%
  xmlTreeParse(useInternalNodes = TRUE) %>%
  xmlRoot %>%
  xmlToList %>%
  (function(x) x$trk) %>%
  (function(x) unlist(x[names(x) == "trkseg"], recursive = FALSE)) %>%
  map_df(function(x) as.data.frame(t(unlist(x)), stringsAsFactors=FALSE))

Upvotes: 1

speendo
speendo

Reputation: 13345

This is the code I ended up with. Thanks to all of you (especially @lukeA) for your help.

library(XML)
library(plyr)

filename <- "Downloads/activity(1).gpx"

gpx.raw <- xmlTreeParse(filename, useInternalNodes = TRUE)

rootNode <- xmlRoot(gpx.raw)

gpx.rawlist <- xmlToList(rootNode)$trk

gpx.list <- unlist(gpx.rawlist[names(gpx.rawlist) == "trkseg"], recursive = FALSE)

gpx <- do.call(rbind.fill, lapply(gpx.list, function(x) as.data.frame(t(unlist(x)), stringsAsFactors=F)))
names(gpx) <- c("ele", "time", "hr", "lon", "lat")

I had some trouble with multiple trksegs as I could not access them by name (because they all have the same name in the list: trkseg) I could solve this with the unlist command and the tricky selection of elements in gpx.rawlist.

I wonder if there is a more elegant way, but at least this seems to work.

Upvotes: 1

Related Questions