Reputation: 347
I am trying to write some code which identifies the greatest two values for each row and provides their column number and value.
df = data.frame( car = c (2,1,1,1,0), bus = c (0,2,0,1,0),
walk = c (0,3,2,0,0), bike = c(0,4,0,0,1))
I've managed to get it to do this for the maximum value using the max
and max.col
functions.
df$max = max.col(df,ties.method="first")
df$val = apply(df[ ,1:4], 1, max)
As far as I know there are no equivalent functions for the second highest value so doing this has made things a little trickier. Using this code provides the second highest value but (importantly) not in situations with ties. Also it looks risky.
sec.fun <- function (x) {
max( x[x!=max(x)] )
}
df$val2 <- apply(df[ ,1:4], 1, sec.fun)
Ideally the solution would not involve removing any original data and could be used to find the third, fourth... highest value but neither of these are essential requirements.
Upvotes: 8
Views: 10224
Reputation: 438
Here's a data.table
solution to identify and record the max column, max value, 2nd largest column, and 2nd largest value for specified columns.
# Library
library(data.table)
# Data
set.seed(123)
df=data.table(V1=rnorm(10),V2=rnorm(10),V3=rnorm(10),V4=letters[1:10])
# MaxColumn
tmp=c('V1','V2','V3') # Search only in these columns
df[,MaxColumn:=apply(.SD,1,FUN=which.max),.SDcols=tmp]
# MaxValue
df[,MaxValue:=apply(.SD,1,FUN=max),.SDcols=tmp]
# Rank2Column (2nd largest)
df[,Rank2Column:=apply(.SD,1,function(x) which(rank(x)==(length(tmp)-1))),.SDcols=tmp]
# Rank2Value
df[,Rank2Value:=apply(.SD,1,function(x) x[which(rank(x)==(length(tmp)-1))]),.SDcols=tmp]
Upvotes: 0
Reputation: 66842
try this:
# a function that returns the position of n-th largest
maxn <- function(n) function(x) order(x, decreasing = TRUE)[n]
this is a closure, so you can use like this:
> # position of the largest
> apply(df, 1, maxn(1))
[1] 1 4 3 1 4
> # position of the 2nd largest
> apply(df, 1, maxn(2))
[1] 2 3 1 2 1
>
> # value of the largest
> apply(df, 1, function(x)x[maxn(1)(x)])
[1] 2 4 2 1 1
> # value of the 2nd largest
> apply(df, 1, function(x)x[maxn(2)(x)])
[1] 0 3 1 1 0
Updated
Why using closure here?
One reason is that you can define a function such as:
max2 <- maxn(2)
max3 <- maxn(3)
then, use it
> apply(df, 1, max2)
[1] 2 3 1 2 1
> apply(df, 1, max3)
[1] 3 2 2 3 2
I'm not sure if the advantage is obvious, but I like this way, since this is more functional-ish way.
Upvotes: 25