Reputation: 1703
I have a data.table
and need to know the index of the row containing a minimal value under a given condition. Simple example:
dt <- data.table(i=11:13, val=21:23)
# i val
# 1: 11 21
# 2: 12 22
# 3: 13 23
Now, suppose I'd like to know in which row val
is minimal under the condition i>=12
, which is 2 in this case.
dt[i>=12, which.min(val)]
# [1] 1
returns 1, because within dt[i>=12]
it is the first row.
Also
dt[i>=12, .I[which.min(val)]]
# [1] 1
returned 1, because .I
is only supposed to be used with grouping.
To apply .I
correctly, I added a grouping column:
dt[i>=12, g:=TRUE]
dt[i>=12, .I[which.min(val)], by=g][, V1]
# [1] 2
Note, that g
is NA
for i<12
, thus which.min
excludes that group from the result.
But, this requires extra computational power to add the column and perform the grouping. My productive data.table
has several millions of rows and I have to find the minimum very often, so I'd like to avoid any extra computations.
Do you have any idea, how to efficiently solve this?
Upvotes: 4
Views: 3241
Reputation: 66819
But, this requires extra computational power to add the column and perform the grouping.
So, keep the data sorted by it if it's so important:
setorder(dt, val)
dt[.(i_min = 12), on=.(i >= i_min), mult="first", which = TRUE]
# 2
This can also be extended to check more threshold i
values. Just give a vector in i_min =
:
dt[.(i_min = 9:14), on=.(i >= i_min), mult="first", which = TRUE]
# [1] 1 1 1 2 3 NA
How it works
x[i, on=, ...]
is the syntax for a join.
i
can be another table or equivalently a list of equal-length vectors..()
is a shorthand for list()
.on=
can have inequalities for a "non-equi join".mult=
can determine what happens when a row of i
has more than one match in x
.which=TRUE
will return row numbers of x
instead of the full joined table.Upvotes: 6
Reputation: 41990
You can use the fact that which.min
will ignore NA values to "mask" the values you don't want to consider:
dt[,which.min(ifelse(i>=12, val, NA))]
As a simple example of this behavior, which.min(c(NA, 2, 1))
returns 3, because the 3rd element is the min among all the non-NA values.
Upvotes: 1