Reputation: 2015
Suppose I have data as follows,
data
required_value Value1 Value2 Value3
0.5 .1 0.3 0.4
1.0 1 0.7 0.2
1.5 .37 0.3 0.7
2.0 1.25 0.9 1.9
I want to find which of the three columns(Value1, Value2, Value3) are closest to the required value and create a new column and have the columnname of the column. My sample output is,
data
required_value Value1 Value2 Value3 output
0.5 .1 0.3 0.4 Value3
1.0 1 0.7 0.2 Value1
1.5 .37 0.3 0.7 Value3
2.0 1.25 0.9 1.9 Value3
I am able to find the closet value in this. But not able to get the corresponding column name. Can anybody help me in doing this?
Thanks
Upvotes: 1
Views: 229
Reputation: 93813
You can break this down to a ?max.col
operation, after sweeping out the differences between the first column and the remaining columns:
names(df[-1])[max.col(-abs(sweep(df[-1], 1, df$required_value)),"first")]
#[1] "Value3" "Value1" "Value3" "Value3"
To address @hrbrmstr 's benchmarks that show this is slower. Yes, it will be slower when the data is small as the overhead of the functions will take a few microseconds more. This code should however scale so that while you lose out by fractions of a second on small data, it will run much faster as the size ramps up. Example timings in seconds:
-sweep/max.col
user system elapsed
10k 0.00 0.00 0.00
100k 0.17 0.01 0.19
1M 1.36 0.12 1.49
5M 3.99 0.59 4.58
-apply
user system elapsed
10k 0.05 0.00 0.05
100k 0.56 0.00 0.57
1M 7.33 0.08 7.41
5M 41.36 0.13 41.52
Upvotes: 3
Reputation: 78792
Posting this answer to what looks like a homework assignment against my better judgement.
read.table(text="required_value Value1 Value2 Value3
0.5 .1 0.3 0.4
1.0 1 0.7 0.2
1.5 .37 0.3 0.7
2.0 1.25 0.9 1.9", header=TRUE) -> df
df$output <- apply(df, 1, function(x) {
names(x)[which.min(abs(x[2:4] - x[1]))+1]
})
## required_value Value1 Value2 Value3 output
## 1 0.5 0.10 0.3 0.4 Value3
## 2 1.0 1.00 0.7 0.2 Value1
## 3 1.5 0.37 0.3 0.7 Value3
## 4 2.0 1.25 0.9 1.9 Value3
UPDATE:
I knew sweep()
had alot of code cruft in it but, wow:
Unit: microseconds
expr min lq mean median uq max neval
apply 83.281 103.156 117.414 113.479 126.790 256.216 100
sweep 1116.052 1194.766 1292.346 1218.801 1301.724 2309.745 100
Upvotes: 3
Reputation: 214957
You can also transform the data from wide to long format then filter out the variable that has the minimum difference from the required value:
library(data.table)
data$id <- seq_len(nrow(data)) # create an id variable for group by operation
longData <- melt(data, id.vars = c("id", "required_value")) # transform to long format
data[longData[, .(variable[which.min(abs(value - required_value))]) ,.(id)], on = "id"]
required_value Value1 Value2 Value3 id V1
1: 0.5 0.10 0.3 0.4 1 Value3
2: 1.0 1.00 0.7 0.2 2 Value1
3: 1.5 0.37 0.3 0.7 3 Value3
4: 2.0 1.25 0.9 1.9 4 Value3
Upvotes: 1