abudis
abudis

Reputation: 2881

Get the name of the first and second lowest values across columns and their names

Suppose I have a data.table like this:

testdt <- data.table(col1 = c(1, 2, 3), col2=c(1.1, 1.9, 3.1), col3=c(1, 2.2, 3))
testdt

A data.table: 3 × 3
col1    col2    col3
<dbl>   <dbl>   <dbl>
1   1.1 1.0
2   1.9 2.2
3   3.1 3.0

I'm trying to get the names of the first and second lowest value column names and the difference between their values.

The result would look like this:

A data.table: 3 × 8
col1    col2    col3    min_col    min_value    second_col    second_value    diff 
<dbl>   <dbl>   <dbl>   <chr>      <dbl>        <chr>         <dbl>           <dbl>
1       1.1     1.0     col1       1            col2          1.1             0.1
2       1.9     2.2     col2       1.9          col1          2               0.1
3       3.1     3.0     col1       3            col2          3.1             0.1

So I started with getting the minimum col name and value:

testdt[, min_col := colnames(.SD)[max.col(-.SD, ties.method="first")], .SDcols=c("col1", "col2", "col3")]
testdt[, min_value := pmin(col1, col2, col3)]

But I don't know if there's any data.tablely way to get the rest of what I need, i.e. the second lowest column name and the value.

Upvotes: 1

Views: 64

Answers (2)

Ma&#235;l
Ma&#235;l

Reputation: 51994

With frankv:

library(data.table)
testdt[
  , min2_col := colnames(.SD)[apply(.SD, 1, \(x) which(frankv(x, ties.method = "dense") == 2)[1])],
       .SDcols=c("col1", "col2", "col3")
  ][
    , min2_value := unique(apply(.SD, 1, \(x) x[frankv(x, ties.method = "dense") == 2])),
       .SDcols=c("col1", "col2", "col3")
    ][
      , diff := min2_value - min_value
    ][]

   col1 col2 col3 min_col min_value min2_col min2_value diff
1:    1  1.1  1.0    col1       1.0     col2        1.1  0.1
2:    2  1.9  2.2    col2       1.9     col1        2.0  0.1
3:    3  3.1  3.0    col1       3.0     col2        3.1  0.1

Upvotes: 3

Wimpel
Wimpel

Reputation: 27732

another approach (not the prettiest, but gets the job done)

# create q row_id
testdt[, id := .I]
# melt to long, order by id and value
dt.melt <- melt(testdt, id.vars = "id")[order(id, value), ]
# summarise by id
dt.melt <- dt.melt[, .(min_col = variable[1], 
            min_value = value[1],
            second_col = variable[!value == value[1]][1],
            second_val = value[!value == value[1]][1]), 
        by = .(id)]
# calcuate differences
dt.melt[, diff := second_val - min_value]
# merge
cbind(testdt, dt.melt[, -1])
#    col1 col2 col3 id min_col min_value second_col second_val diff
# 1:    1  1.1  1.0  1    col1       1.0       col2        1.1  0.1
# 2:    2  1.9  2.2  2    col2       1.9       col1        2.0  0.1
# 3:    3  3.1  3.0  3    col1       3.0       col2        3.1  0.1

Upvotes: 2

Related Questions