Reputation: 899
I wish to apply a set of pre-written functions to subsets of data in a data frame that progressively increase in size. In this example, the pre-written functions calculate 1) the distance between each consecutive pair of locations in a series of data points, 2) the total distance of the series of data points (sum of step 1), 3) the straight line distance between the start and end location of the series of data points and 4) the ratio between the straight line distance (step3) and the total distance (step 2). I wish to know how to apply these steps (and consequently similar functions) to sub-groups of increasing size within a data frame. Below are some example data and the pre-written functions.
Example data:
> dput(df)
structure(list(latitude = c(52.640715, 52.940366, 53.267749,
53.512608, 53.53215, 53.536443), longitude = c(3.305727, 3.103194,
2.973257, 2.966621, 3.013587, 3.002674)), .Names = c("latitude",
"longitude"), class = "data.frame", row.names = c(NA, -6L))
Latitude Longitude
1 52.64072 3.305727
2 52.94037 3.103194
3 53.26775 2.973257
4 53.51261 2.966621
5 53.53215 3.013587
6 53.53644 3.002674
Pre-written functions:
# Step 1: To calculate the distance between a pair of locations
pairdist = sapply(2:nrow(df), function(x) with(df, trackDistance(longitude[x-1], latitude[x-1], longitude[x], latitude[x], longlat=TRUE)))
# Step 2: To sum the total distance between all locations
totdist = sum(pairdist)
# Step 3: To calculate the distance between the first and end location
straight = trackDistance(df[1,2], df[1,1], df[nrow(df),2], df[nrow(df),1], longlat=TRUE)
# Step 4: To calculate the ratio between the straightline distance & total distance
distrat = straight/totdist
I would like to apply the functions firstly to a sub-group of only the first two rows (i.e. rows 1-2), then to a subgroup of the first three rows (rows 1-3), then four rows…and so on…until I get to the end of the data frame (in the example this would be a sub-group containing rows 1-6, but it would be nice to know how to apply this to any data frame).
Desired output:
Subgroup Totdist Straight Ratio
1 36.017 36.017 1.000
2 73.455 73.230 0.997
3 100.694 99.600 0.989
4 104.492 101.060 0.967
5 105.360 101.672 0.965
I have attempted to do this with no success and at the moment this is beyond my ability. Any advice would be very much appreciated!
Upvotes: 3
Views: 166
Reputation: 108543
There's a lot of optimization that can be done.
trackDistance()
is vectorized, so you don't need apply for that.cumsum()
To get everything in one function that outputs the desired data frame, you can do something along those lines :
myFun <- function(x){
# This is just to make typing easier in the rest of the function
lat <- x[["Latitude"]]
lon <- x[["Longitude"]]
nr <- nrow(x)
pairdist <-trackDistance(lon[-nr],lat[-nr],
lon[-1],lat[-1],
longlat=TRUE)
totdist <- cumsum(pairdist)
straight <- trackDistance(rep(lon[1],nr-1),
rep(lat[1],nr-1),
lon[-1],lat[-1],
longlat=TRUE)
ratio <- straight/totdist
data.frame(totdist,straight,ratio)
}
Proof of concept:
> myFun(df)
totdist straight ratio
1 36.01777 36.01777 1.0000000
2 73.45542 73.22986 0.9969293
3 100.69421 99.60013 0.9891346
4 104.49261 101.06023 0.9671519
5 105.35956 101.67203 0.9650005
Note that you can add extra arguments to define the latitude and longitude columns. And watch your capitalization, in your question you use Latitude in the data frame, but latitude (small l) in your code.
Upvotes: 2