Saksham
Saksham

Reputation: 9390

data.frame: find last index of a value in each row

I have a data.frame like

  a b c d
1 1 0 0 1
2 1 1 0 0
3 0 1 0 0
4 1 0 1 0
5 1 0 0 0

Which I generated using

df<- data.frame(a=sample(0:1,5,replace=T),b=sample(0:1,5,replace=T),c=sample(0:1,5,replace=T),d=sample(0:1,5,replace=T))

How can I get the result as 4, 2, 2, 3, 1 if I pass 1 to that function depicting to find the last index of 1 in each row.

Upvotes: 2

Views: 1142

Answers (4)

Saksham
Saksham

Reputation: 9390

Seeing all the possible solutions and one from my side, here are the times taken by each replicated 10,000 times

apply(df,1,function(x){tail(which(x==1),1)})
user  system elapsed
2.978  0.010  2.988


apply(df*col(df),1,function(x){max(x)})
user  system elapsed
8.217  0.026  8.245



apply(df, 1, function(x) max(which(x == 1)))
user  system elapsed
1.621  0.005  1.627


max.col(df, "last")
user  system elapsed
1.348  0.004  1.352

Though @Mamoun Benghezal's answer is the most efficient, it doesn't solve my purpose of being flexible. The accepted answer does.

Upvotes: 0

akrun
akrun

Reputation: 887871

Another option is using pmax. We multiply the col(df) by 'df' and get the max value by row.

  do.call(pmax,col(df)*df)
  #[1] 4 2 2 3 1

col(df) is a convenient function to get the column index of the dataset.

  col(df)
  #     [,1] [,2] [,3] [,4]
  #[1,]    1    2    3    4
  #[2,]    1    2    3    4
  #[3,]    1    2    3    4
  #[4,]    1    2    3    4
  #[5,]    1    2    3    4

By doing the multiplication of 'df' with the col(df) of equal dimension, the '0' values will remain 0 while the places that are '1' will be replaced by the column index, i.e.

 col(df)*df
 #  a b c d
 #1 1 0 0 4
 #2 1 2 0 0
 #3 0 2 0 0
 #4 1 0 3 0
 #5 1 0 0 0

Now, we can get the max value per each row by do.call(pmax)

Upvotes: 4

josliber
josliber

Reputation: 44340

One approach would be:

apply(df, 1, function(x) max(which(x == 1)))

If you wanted to be flexible about which element you're checking for and handle cases where the value is missing from a row:

max.row <- function(df, val) unname(apply(df, 1, function(x) tail(c(NA, which(x == val)), 1)))
max.row(df, 0)
# [1] 3 4 4 4
max.row(df, 1)
# [1] 4 2 2 3
max.row(df, 2)
# [1] NA NA NA NA

Upvotes: 4

Mamoun Benghezal
Mamoun Benghezal

Reputation: 5314

you can try max.col which is a little bit faster than apply

max.col(df, "last")
# [1] 2 4 4 2 4

Data

set.seed(1)
df <- data.frame(a=sample(0:1,5,replace=T),b=sample(0:1,5,replace=T),c=sample(0:1,5,replace=T),d=sample(0:1,5,replace=T))

Upvotes: 4

Related Questions