Reputation: 183
I have a dataset with origin-destination data and some associated variables. It looks something like this:
"Origin","Destination","distance","volume"
"A01" "A01" 0.0 10
"A02" "A01" 1.2 9
"A03" "A01" 1.4 15
"A01" "A02" 1.2 16
Then for each origin-destination pair, I want to be able to calculate additional variables based on data in both that row and in selected other rows. For example, how many other origin-areas, going to that destination, have traffic volumes greater than the focal pair. In this example, I would end up with the following for destination A01.
"Origin","Destination","distance","volume","greater_flow"
"A01" "A01" 0.0 10 1
"A02" "A01" 1.2 9 2
"A03" "A01" 1.4 15 0
I have been trying to work out something with group_by
and apply
but can't work out how to a) 'fix' the data I want to use as a reference (volume from A01 to A01) and b) restrict the comparison only to data with the same destination (A01) and c) repeat for all origin-destination pairs.
Upvotes: 4
Views: 2160
Reputation: 1513
here is an answer using base R (using apply
):
d <- data.frame(Origin = c("A01", "A02", "A03", "A01"), Destination = c("A01", "A01", "A01", "A02"), distance = c(0.0, 1.2, 1.4, 1.2), volume = c(10, 9, 15, 16))
# extracting entries with destination = A01
d2 <- d[d[, "Destination"] == "A01", ]
# calculating number of rows satisfying your condition
greater_flow <- apply(d2, 1, FUN = function(x) max(sum(x['volume'] < d2[, 'volume']) - 1, 0) )
# sticking things back together
data.frame(d2, greater_flow)
# Origin Destination distance volume greater_flow
# 1 A01 A01 0.0 10 1
# 2 A02 A01 1.2 9 2
# 3 A03 A01 1.4 15 0
if you need to do the calculation for all possible destinations you can just cycle through unique(d[, "Destination"])
:
lapply(unique(d[, "Destination"]), FUN = function(dest){
d2 <- d[d[, "Destination"] == dest, ]
greater_flow <- apply(d2, 1, FUN = function(x) max(sum(x['volume'] < d2[, 'volume']) - 1, 0) )
data.frame(d2, greater_flow)
})
you can then glue the output together if needed via do.call(rbind, output)
.
Upvotes: 2
Reputation: 12559
library(plyr)
Fun <- function(x) { x <- x[order(x$volume),]; x$greater_flow <- (1:nrow(x))-1; x }
ddply(d, ~ Destination, .fun=Fun)
Upvotes: 0