Reputation: 141
Consider the following example:
Var_A <- sample(1:100,5,replace=TRUE)
Var_B <- sample(1:100,5,replace=TRUE)
Var_C <- sample(1:100,5,replace=TRUE)
Var_D <- sample(1:100,5,replace=TRUE)
DF <- as.data.frame(cbind(Var_A,Var_B,Var_C,Var_D))
In R, functions already exist to find the element-wise max and min, so I could easily create a new variable that is equal to the largest (or smallest) value across the columns of interest:
> DF$Max <- pmax(Var_A,Var_B,Var_C,Var_D)
> DF
Var_A Var_B Var_C Var_D Max
1 44 33 6 72 72
2 29 66 51 12 66
3 35 29 47 79 79
4 39 79 47 65 79
5 97 60 36 81 97
But what if I need to create a variable that captures, say, the second largest value in each row (i.e., across the columns)?
In the real data set that I'm working with, I have 600+ columns and about 28 million records. I need to create variables that will identify and store the largest, second largest, third largest, etc. values found when looking across the variables (columns) for each record, much like pmax
would do, but for other ordinals.
The only way that I have been able to functionally make it work on a subset of the data is to do a loop, but that loop won't finish in my lifetime if I run it on the entire data set. I also considered using the apply
function, but my understanding is that apply
will convert the data set to a matrix first, which my data set won't take kindly to.
Any suggestions on a non-loop way to do this? And with this amount of data, the faster the better...
Upvotes: 0
Views: 108
Reputation: 3755
This may be a solution...
Var_A <- sample(1:100,5,replace=TRUE)
Var_B <- sample(1:100,5,replace=TRUE)
Var_C <- sample(1:100,5,replace=TRUE)
Var_D <- sample(1:100,5,replace=TRUE)
DF <- as.data.frame(cbind(Var_A,Var_B,Var_C,Var_D))
result <-sapply(1:nrow(DF), function(x) {
df <- as.data.frame(DF[x,])
ord <- df[order(-DF[x,])]
})
result <- t(result)
output <- cbind(DF,result)
for (i in (ncol(DF)+1):ncol(output) ) {
colnames(output)[i]<-paste0("Max",i-ncol(DF))
}
output
Var_A Var_B Var_C Var_D Max1 Max2 Max3 Max4
1 42 12 64 9 64 42 12 9
2 67 22 47 4 67 47 22 4
3 80 56 82 94 94 82 80 56
4 31 62 88 73 88 73 62 31
5 91 67 15 41 91 67 41 15
Upvotes: 1