wolfsatthedoor
wolfsatthedoor

Reputation: 7313

efficiently finding first nonzero element (corresponding column) of a data table

There are some answers on stack to the below type of question, but they are all inefficient and do not scale well.

To reproduce it, suppose I have data that looks like this:

    tempmat=matrix(c(1,1,0,4,1,0,0,4,0,1,0,4, 0,1,1,4, 0,1,0,5),5,4,byrow=T)
    tempmat=rbind(rep(0,4),tempmat)
    tempmat=data.table(tempmat)
    names(tempmat)=paste0('prod1vint',1:4)

This is what the data look like, although it is MUCH bigger, so the solution cannot be an "apply" or row-wise based approach.

> tempmat
   prod1vint1 prod1vint2 prod1vint3 prod1vint4
1:          0          0          0          0
2:          1          1          0          4
3:          1          0          0          4
4:          0          1          0          4
5:          0          1          1          4
6:          0          1          0          5

I want to identify the column of the first nonzero element, so the output would look like this:

> tempmat
   prod1vint1 prod1vint2 prod1vint3 prod1vint4 firstnonzero
1:          0          0          0          0           NA
2:          1          1          0          4            1
3:          1          0          0          4            1
4:          0          1          0          4            2
5:          0          1          1          4            2
6:          0          1          0          5            2

Upvotes: 2

Views: 181

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389135

One option is to use rowSums with max.col specifying ties.method = "first"

temp <- tempmat != 0
(NA^(rowSums(temp) == 0)) * max.col(temp, ties.method = "first")
#[1] NA  1  1  2  2  2

max.col would give column index of first maximum value in every row. However, this would return 1 in case all the values are 0 (like in 1st row) since 0 is the maximum value in the row. To avoid that we check if there is at least one non-zero value in the row using rowSums and multiply it to max.col output.

Upvotes: 2

Related Questions