bjhend
bjhend

Reputation: 1703

Apply which.min to data.table under a condition

I have a data.table and need to know the index of the row containing a minimal value under a given condition. Simple example:

dt <- data.table(i=11:13, val=21:23)

#     i  val
# 1: 11   21
# 2: 12   22
# 3: 13   23

Now, suppose I'd like to know in which row val is minimal under the condition i>=12, which is 2 in this case.

What didn't work:

dt[i>=12, which.min(val)]
# [1] 1

returns 1, because within dt[i>=12] it is the first row.

Also

dt[i>=12, .I[which.min(val)]]
# [1] 1

returned 1, because .I is only supposed to be used with grouping.

What did work:

To apply .I correctly, I added a grouping column:

dt[i>=12, g:=TRUE]
dt[i>=12, .I[which.min(val)], by=g][, V1]
# [1] 2

Note, that g is NA for i<12, thus which.min excludes that group from the result.

But, this requires extra computational power to add the column and perform the grouping. My productive data.table has several millions of rows and I have to find the minimum very often, so I'd like to avoid any extra computations.

Do you have any idea, how to efficiently solve this?

Upvotes: 4

Views: 3241

Answers (2)

Frank
Frank

Reputation: 66819

But, this requires extra computational power to add the column and perform the grouping.

So, keep the data sorted by it if it's so important:

setorder(dt, val)
dt[.(i_min = 12), on=.(i >= i_min), mult="first", which = TRUE]
# 2

This can also be extended to check more threshold i values. Just give a vector in i_min =:

dt[.(i_min = 9:14), on=.(i >= i_min), mult="first", which = TRUE]
# [1]  1  1  1  2  3 NA

How it works

x[i, on=, ...] is the syntax for a join.

  • i can be another table or equivalently a list of equal-length vectors.
  • .() is a shorthand for list().
  • on= can have inequalities for a "non-equi join".
  • mult= can determine what happens when a row of i has more than one match in x.
  • which=TRUE will return row numbers of x instead of the full joined table.

Upvotes: 6

Ryan C. Thompson
Ryan C. Thompson

Reputation: 41990

You can use the fact that which.min will ignore NA values to "mask" the values you don't want to consider:

dt[,which.min(ifelse(i>=12, val, NA))]

As a simple example of this behavior, which.min(c(NA, 2, 1)) returns 3, because the 3rd element is the min among all the non-NA values.

Upvotes: 1

Related Questions