Virag Swami
Virag Swami

Reputation: 197

Find the indices of top n elements in a row after ignoring selected indices

I have a dataframe df1 and a list l1 like this :

df1 <- data.frame(c1 = c(4.2, 1.2, 3.0) , c2 = c(2.3, 1.8, 12.0 ) ,c3 = c(1.2, 3.2, 2.0 ) , c4 = c(2.2, 1.9, 0.9) )
l1 <- list(x1 = c(2,4) ,x2 = c(3) ,x3 = c(2))

Where l1 contains the list of indices to ignore in df1. Now, I want to find the indices of top 2 (can be higher) elements after excluding the indices in list l1 for every row. Actual data has much more rows and columns. So, the expected output is :

[1,]    1 3
[2,]    2 4
[3,]    1 3

where df1 :

      c1   c2  c3  c4
1    4.2  2.3 1.2 2.2
2    1.2  1.8 3.2 1.9
3    3.0 12.0 2.0 0.9

If the indices can be in the order of the values of their placeholders, that would also be helpful. Then the expected output would be :

 [1,]    1 3
 [2,]    4 2
 [3,]    1 3

Upvotes: 2

Views: 54

Answers (3)

akrun
akrun

Reputation: 887118

We can use rank

lapply(seq_len(nrow(df1)), function(i) {
      x1 <- unlist(df1[i,])
      i2 <- l1[[i]]
      i3 <- seq_along(x1) %in% i2
      which(rank(-x1*NA^i3) %in% 1:2) })
#[[1]]
#[1] 1 3

#[[2]]
#[1] 2 4

#[[3]]
#[1] 1 3

Update

If we need it in order

lapply(seq_len(nrow(df1)), function(i) {
  x1 <- unlist(df1[i,])
  i2 <- l1[[i]]
  i3 <- seq_along(x1) %in% i2
  i4 <- which(rank(-x1*NA^i3) %in% 1:2)
  i4[order(-x1[i4])]      

    })
#[[1]]
#[1] 1 3

#[[2]]
#[1] 4 2

#[[3]]
#[1] 1 3

Upvotes: 3

gowerc
gowerc

Reputation: 1099

Also using rank but returning a matrix. Syntax is made a little ugly by t() converting the data.frame into a matrix

df1 <- data.frame(c1 = c(4.2, 1.2, 3.0) , c2 = c(2.3, 1.8, 12.0 ) ,c3 = c(1.2, 3.2, 2.0 ) , c4 = c(2.2, 1.9, 0.9) )
l1 <- list(x1 = c(2,4) ,x2 = c(3) ,x3 = c(2))


indexOrderSub <- function( df , excl  , top = 2) {
    z <- 1:length(df)
    sel <-  !( z  %in%  excl )
    rz <- z[ sel   ]
    rz2 <- tail( rz[order(  rank(df)[ sel ]   )] , top)
    rz2[order(rz2)]
}


t( mapply( indexOrderSub , as.data.frame(t(df1)) , l1)) 

Upvotes: 1

Qaswed
Qaswed

Reputation: 3879

I understand the question as follows. For each row i of df1, exclude the elements with number l1[i] and then give the indices of the largest two remaining elements.

highest.two <- function(x){
  first.highest_position <- which.max(x) 
  second.highest_value <- max(x[-first.highest_position])
  second.highest_position <- which(x == second.highest_value)
  return(c(first.highest_position, second.highest_position))
}

ret <- matrix(NA, nrow = nrow(df1), ncol = 2)
for(i in 1:nrow(df1)){
  tmp <- df1[i, ]
  tmp[l1[i][[1]]] <- -Inf
  ret[i, ] <- highest.two(tmp) #if you want to have these indices ordered use sort(highest.two(tmp))
}
ret

Upvotes: 2

Related Questions