Reputation: 900
I have a JxK dataframe M and I want to calculate the following.
Then, let the values satisfying the first be vector A_j and the second be vector A_k. Then, I need two vectors. Let vector C be the vector sort(c(A_j, A_k)).
For both of the two sorted vectors mentioned above, all ties should be given the first index at which that value appeared in vector C. That is, if A_j[i] and A_j[i+1] are equal, then element i and element i + 1 in the vector that satisfies condition #3 should both equal A_j[i]'s position in the sorted vector C.
As always, this is not hard to do inefficiently. However, in practice, the dataframe is very big, so inefficient solutions fail.
As a proof of concept, one solution would be as follows.
# Create the dataframe
set.seed(1)
df <- data.frame(matrix(rnorm(50, 8, 2), 10)) # A 10x5 matrix
# Calculate 1 and 2
A.j <- apply(df, 1, min)
A.k <- apply(df, 2, min)
# Calculate 3 and 4
C <- sort(unname(c(A.j, A.k)))
A.j.indices <- apply(df, 1, function(x) which(x == min(x)))
A.k.indices <- apply(df, 2, function(x) which(x == min(x)))
vec3out <- c()
vec4out <- c()
for(j in 1:nrow(df)){
rank <- which(C == A.j[j])[1]
vec3out <- c(vec3out, rank)
}
for(k in 1:ncol(df)){
rank <- which(C == A.k[k])[1]
vec4out <- c(vec4out, rank)
}
Upvotes: 2
Views: 1473
Reputation: 4030
For starters, you should use a matrix. Data.frames are less efficient (Should I use a data.frame or a matrix?). Then, we should use apply functions.
Let M be your data.frame coerced to a matrix.
M <- as.matrix(M)
minByRow <- apply(M, MARGIN=1, FUN=which.min)
minByCol <- apply(M, MARGIN=2, FUN=which.min)
combinedSorted <- sort(c(minByRow, minByCol))
byRowOutput <- match(minByRow, combinedSorted)
byColOutput <- match(minByCol, combinedSorted)
Here are the results for 1 million observations of 100 variables:
M <- matrix(data=rnorm(100000000), nrow=1000000, ncol=100)
system.time({
minByRow <- apply(M, MARGIN=1, FUN=which.min)
minByCol <- apply(M, MARGIN=2, FUN=which.min)
combinedSorted <- sort(c(minByRow, minByCol))
byRowOutput <- match(minByRow, combinedSorted)
byColOutput <- match(minByCol, combinedSorted)
})
user system elapsed
7.37 0.46 7.93
Upvotes: 2