R return column name based on conditions

Question

For the following data frame:

DF <- data.frame(Row=c(1,2,3,4,5),`2.04`=c(1,1,0,1,1),`2.05`=c(0,0,0,0,1),
       `2.06`=c(1,0,0,0,1),`2.07`=c(1,0,0,0,1),`2.08`=c(1,1,1,0,0), check.names = F)

I'd like to return into a new vector for each row the column name that has both both (a) a value greater than 0 in the relevant row; (b) column name has the highest value of all those that meet condition a, such that:

DF <- data.frame(Row=c(1,2,3,4,5),'2.04'=c(1,1,0,0,1),'2.05'=c(0,0,0,0,1),
                '2.06'=c(1,0,0,0,1),'2.07'=c(1,0,0,1,1),'2.08'=c(1,1,1,0,0),
                Results=c(2.08,2.08,2.08,2.04,2.07)

So for row 2 the columns 2.04 and 2.08 meet condition (a), and only 2.08 meets condition (b) because 2.08>2.04.

dplyr or data.table would be preferred.

lmo · Accepted Answer

You could also use max.col like this

DF$results <- names(DF[-1])[max.col(DF[-1], "last")]
DF
  Row 2.04 2.05 2.06 2.07 2.08 results
1   1    1    0    1    1    1    2.08
2   2    1    0    0    0    1    2.08
3   3    0    0    0    0    1    2.08
4   4    1    0    0    0    0    2.04
5   5    1    1    1    1    0    2.07

max.col returns the column position of the maximum value for each row. It takes a second argument, ties.method, which is set to "last" here in order to return the largest column position for each row. These column positions are used to extract the column names with [ which are then converted to numeric and put into a vector.

R return column name based on conditions

Answers (2)

Related Questions