B_Dabbler
B_Dabbler

Reputation: 183

Rowwise comparison of data in R

I have a dataset with origin-destination data and some associated variables. It looks something like this:

    "Origin","Destination","distance","volume"
    "A01"     "A01"          0.0        10
    "A02"     "A01"          1.2         9
    "A03"     "A01"          1.4        15 
    "A01"     "A02"          1.2        16

Then for each origin-destination pair, I want to be able to calculate additional variables based on data in both that row and in selected other rows. For example, how many other origin-areas, going to that destination, have traffic volumes greater than the focal pair. In this example, I would end up with the following for destination A01.

    "Origin","Destination","distance","volume","greater_flow"
    "A01"    "A01"            0.0        10         1
    "A02"    "A01"            1.2         9         2
    "A03"    "A01"            1.4        15         0

I have been trying to work out something with group_by and apply but can't work out how to a) 'fix' the data I want to use as a reference (volume from A01 to A01) and b) restrict the comparison only to data with the same destination (A01) and c) repeat for all origin-destination pairs.

Upvotes: 4

Views: 2160

Answers (2)

stas g
stas g

Reputation: 1513

here is an answer using base R (using apply):

d <- data.frame(Origin = c("A01", "A02", "A03", "A01"), Destination = c("A01", "A01", "A01", "A02"), distance = c(0.0, 1.2, 1.4, 1.2), volume = c(10, 9, 15, 16))

# extracting entries with destination = A01
d2 <- d[d[, "Destination"] == "A01", ]

# calculating number of rows satisfying your condition
greater_flow <- apply(d2, 1, FUN = function(x) max(sum(x['volume'] < d2[, 'volume']) - 1, 0) )

# sticking things back together
data.frame(d2, greater_flow)

#  Origin Destination distance volume greater_flow
# 1    A01         A01      0.0     10            1
# 2    A02         A01      1.2      9            2
# 3    A03         A01      1.4     15            0

if you need to do the calculation for all possible destinations you can just cycle through unique(d[, "Destination"]):

 lapply(unique(d[, "Destination"]), FUN = function(dest){
         d2 <- d[d[, "Destination"] == dest, ]
         greater_flow <- apply(d2, 1, FUN = function(x) max(sum(x['volume'] < d2[, 'volume']) - 1, 0) )

    data.frame(d2, greater_flow)    
 })

you can then glue the output together if needed via do.call(rbind, output).

Upvotes: 2

jogo
jogo

Reputation: 12559

library(plyr)
Fun <- function(x) { x <- x[order(x$volume),]; x$greater_flow <- (1:nrow(x))-1; x }
ddply(d, ~ Destination, .fun=Fun)

Upvotes: 0

Related Questions