user3641630
user3641630

Reputation: 323

Get the value and position of column based on a variable

Here is a code to replicate my dataset.

col1=c(20,15,NA,NA)
col2=c(30,30,6,NA)
col3=c(40,NA,7,NA)
col4=c(NA,60,8,NA)
col5=c(60,75,9,NA)
check=c(40,35,10,NA)

df=data.frame(col1,col2,col3,col4,col5,check)

I would like to get the position of the column which is greater than the "check" column.If possible, I would also like to get the value of that column as well.

Here is a function that I create, which does not work:

fun=function(x){
        j1=which(x>df$check)[1]
        if(is.na(j1)){
                NA
        }
        else if (!is.na(j1)){
                j1
        }
}

df$test=apply(df[,1:5],1,fun)

My final data frame would look like this:

col1=c(20,15,NA,NA)
col2=c(30,30,6,NA)
col3=c(40,NA,7,NA)
col4=c(NA,60,8,NA)
col5=c(60,75,9,NA)
check=c(40,35,10,NA)
test=c(5,4,NA,NA)
value=c(60,60,NA,NA)
df=data.frame(col1,col2,col3,col4,col5,check, test,value)

Any help would be appreciated. Thanks

Upvotes: 2

Views: 409

Answers (1)

akrun
akrun

Reputation: 886938

We can use max.col to get the column index. Using the sequence of rows and the column index, we can extract the elements from the first five columns.

#created a logical matrix
m1 <- df[1:5] > df$check
#changed the NA elements to FALSE
m1[is.na(m1)] <- FALSE
#used max.col to get the column index.  For rows that have all FALSE
#we change it to 0 after multiplying with the logical index of `rowSums(..`.
v1 <- max.col(m1, 'first')*(rowSums(m1)!=0)
#reconvert the 0 values to NA
test <-  NA^(v1==0)*v1
#extract the elements using row/column index
value <- df[1:5][cbind(1:nrow(df), test)]
#cbind the new vectors to get the desired output.
df <- cbind(df, test, value)
df
#   col1 col2 col3 col4 col5 check test value
#1   20   30   40   NA   60    40    5    60
#2   15   30   NA   60   75    35    4    60
#3   NA    6    7    8    9    10   NA    NA
#4   NA   NA   NA   NA   NA    NA   NA    NA

Or both the columns can be created using apply. Though, this may be compact, it could be less efficient compared to the first solutio. We loop through the rows using apply with MARGIN=1, get the numeric index of elements 1 to 5 that are greater than the 6th value, subset the first occurence ([1], if there are no elements, this will automatically convert that to NA). Based on this index, we subset the element, concatenate, get the transpose and assign to new columns in 'df'.

df[c('test', 'value')] <- t(apply(df, 1, function(x) {
             i1 <- which(x[1:5]>x[6])[1]
              c(i1, x[i1])}))

Upvotes: 2

Related Questions