Compare value in vector with all other vectors

Question

Let's assume following dataset:

+---------------+-----------+---------------------+
| flightCarrier | saleTotal | daysBeforeDeparture |
+---------------+-----------+---------------------+
| KL            | 477.99    |                   0 |
| AF            | 457.99    |                   0 |
| SQ            | 556.31    |                   0 |
+---------------+-----------+---------------------+

What I'd like to do is the following:

Compare a value in a column to all other values in the same column.
Is saleTotal(1) is smaller than the value of saleTotal(2) and saleTotal(3)
If yes, by how much? saleTotal(3)/saleTotal(1)

Workorder:

477,99 < 457,99 (false)
477,99 < 556.31 (true) -> (556.31/477.99)-1=1.16 (16% increase)
457.99 < 477.99 (true) -> .....
457.99 < 556.31 (true) -> .....
556.31 < 477.99 (false)
556.31 < 457.99 (false)

What I've tried so far:

cal <- apply(df_matrix[1:2,2], 1, function(x) {
  A <- x
  x <- x[-1]
  ifelse(x>A, 1, ifelse(x



This didn't worked out and prints out "logical(0)" so I guess no results.
I tried lots of ways, with lapply, mapply but all seemed to go comparing static numbers instead of previous rows.

What I grasped from apply yet, is that every X there is the row its "iterating". Thats why I tried to compare X>A while A is the whole vector with all saleTotal values. Thus, iterating through each of one.



Expected Output 
Business Output: "Price is cheaper than XY other prices"

I guess this would be the best way to avoid large matrices and keep memory as low as possible)
Is there might be a way to "nrow()" the results directly rather than creating a matrices/list first?

+-----------+-------------+
| saleTotal | cheaperThan |
+-----------+-------------+
| 477.99    |           1 |
| 457.99    |           2 |
| 556.31    |           0 |
+-----------+-------------+


Any idea how to do this? What about performance, I have 100000+ rows?

EDIT: expected output (one way)

Tensibai · Accepted Answer

See note at end about efficiency

With your expected output you may iterate on each value and count (sum the TRUE values) how many time this value is cheaper than all the others values and return a list to 'pair' the value with the count:

sapply(data[,2],function(x) {
  list(x, sum(x < data[,2]))
})

which gives in long format:

     [,1]   [,2]   [,3]  
[1,] 477.99 457.99 556.31
[2,] 1      2      0

In case you just wish to add a column to your existing dataset this should do:

data$cheaperThan <- sapply(data[,2],function(x) sum(x < data[,2]))

Data used:

> system.time(sapply(large,function(x) sum(x < large)))
utilisateur     système      écoulé 
       1.08        0.22        1.30 
> system.time(length(large) - findInterval(large,sort(large)))
utilisateur     système      écoulé 
       0.01        0.00        0.01

@alexis_laz solution if really really really more efficient:

> set.seed(123)
> test <- runif(50000)*100
> identical(sapply(test,function(x) sum(x < test)), (length(test) - findInterval(test,sort(test))))
[1] TRUE
> system.time(sapply(test,function(x) sum(x < test)))
utilisateur     système      écoulé 
      13.64        1.24       14.96 
> system.time(length(test) - findInterval(test,sort(test)))
utilisateur     système      écoulé 
       0.01        0.00        0.02

Compare value in vector with all other vectors

Answers (2)

Related Questions