haimen
haimen

Reputation: 2015

Calculate the minimum difference among the three column and give the corresponding column name in R

Suppose I have data as follows,

data

required_value          Value1          Value2        Value3
     0.5                  .1              0.3           0.4
     1.0                   1              0.7           0.2
     1.5                  .37             0.3           0.7
     2.0                  1.25            0.9           1.9

I want to find which of the three columns(Value1, Value2, Value3) are closest to the required value and create a new column and have the columnname of the column. My sample output is,

data

required_value          Value1          Value2        Value3       output
     0.5                  .1              0.3           0.4        Value3
     1.0                   1              0.7           0.2        Value1
     1.5                  .37             0.3           0.7        Value3
     2.0                  1.25            0.9           1.9        Value3

I am able to find the closet value in this. But not able to get the corresponding column name. Can anybody help me in doing this?

Thanks

Upvotes: 1

Views: 229

Answers (3)

thelatemail
thelatemail

Reputation: 93813

You can break this down to a ?max.col operation, after sweeping out the differences between the first column and the remaining columns:

names(df[-1])[max.col(-abs(sweep(df[-1], 1, df$required_value)),"first")]
#[1] "Value3" "Value1" "Value3" "Value3"

To address @hrbrmstr 's benchmarks that show this is slower. Yes, it will be slower when the data is small as the overhead of the functions will take a few microseconds more. This code should however scale so that while you lose out by fractions of a second on small data, it will run much faster as the size ramps up. Example timings in seconds:

-sweep/max.col
           user  system elapsed
10k        0.00    0.00    0.00
100k       0.17    0.01    0.19
1M         1.36    0.12    1.49 
5M         3.99    0.59    4.58 

-apply
           user  system elapsed 
10k        0.05    0.00    0.05
100k       0.56    0.00    0.57
1M         7.33    0.08    7.41
5M        41.36    0.13   41.52 

Upvotes: 3

hrbrmstr
hrbrmstr

Reputation: 78792

Posting this answer to what looks like a homework assignment against my better judgement.

read.table(text="required_value          Value1          Value2        Value3
     0.5                  .1              0.3           0.4
     1.0                   1              0.7           0.2
     1.5                  .37             0.3           0.7
     2.0                  1.25            0.9           1.9", header=TRUE) -> df


df$output <- apply(df, 1, function(x) {
  names(x)[which.min(abs(x[2:4] - x[1]))+1]
})

##   required_value Value1 Value2 Value3 output
## 1            0.5   0.10    0.3    0.4 Value3
## 2            1.0   1.00    0.7    0.2 Value1
## 3            1.5   0.37    0.3    0.7 Value3
## 4            2.0   1.25    0.9    1.9 Value3

UPDATE:

I knew sweep() had alot of code cruft in it but, wow:

Unit: microseconds
  expr      min       lq     mean   median       uq      max neval
 apply   83.281  103.156  117.414  113.479  126.790  256.216   100
 sweep 1116.052 1194.766 1292.346 1218.801 1301.724 2309.745   100

enter image description here

Upvotes: 3

akuiper
akuiper

Reputation: 214957

You can also transform the data from wide to long format then filter out the variable that has the minimum difference from the required value:

library(data.table)
data$id <- seq_len(nrow(data))   # create an id variable for group by operation
longData <- melt(data, id.vars = c("id", "required_value"))   # transform to long format
data[longData[, .(variable[which.min(abs(value - required_value))]) ,.(id)], on = "id"]

   required_value Value1 Value2 Value3 id     V1
1:            0.5   0.10    0.3    0.4  1 Value3
2:            1.0   1.00    0.7    0.2  2 Value1
3:            1.5   0.37    0.3    0.7  3 Value3
4:            2.0   1.25    0.9    1.9  4 Value3

Upvotes: 1

Related Questions