Sunny League
Sunny League

Reputation: 149

Rearranging each row from largest value to smallest value in R

My data frame is set as follows:

Black      White       Red       Blue
 0.8        0.1        0.07      0.03
 0.3        0.6         0        0.1
 0.1        0.6        0.25      0.05

I wanted my data frame to look like this:

Black      White       Red       Blue     Color1     Color2     Color3    Color4
 0.8        0.1        0.07      0.03      0.8        0.1        0.07      0.03 
 0.3        0.6         0        0.1       0.6        0.3         0.1        0
 0.1        0.6        0.25      0.05      0.6        0.25        0.1      0.05

In which Color1 represents the largest value for each row, Color2 represents the second largest value, Color3 represents the third largest, and Color4 represents the smallest value for each row.

So far, I've used this function to obtain what I wanted, which is the result above:

maxn <- function(n) function(x) order(x, decreasing = TRUE)[n]
df$Color1 <- apply(df, 1, max)
df$Color2 <- apply(df, 1, function(x)x[maxn(3)(x)])
df$Color3 <- apply(df, 1, function(x)x[maxn(4)(x)])
df$Color4 <- apply(df, 1, function(x)x[maxn(5)(x)])

Is there a more concise way for me to arrange my dataset?

Additionally, a bit off-topic: I'm not sure if it's because this is a CSV file that I'm working with that whenever I use the function

df$Color2 <- apply(df, 1, function(x)x[maxn(2)(x)])

It will return the same result as the function

apply(df, 1, max)

AND

apply(df, 1, function(x)x[maxn(1)(x)])

Upvotes: 0

Views: 257

Answers (2)

thelatemail
thelatemail

Reputation: 93938

It's quite a bit more complex but a speedier solution if you're dealing with a large number of rows is to only do the sorting/ordering once and re-insert it into a matrix shape:

matrix(x[order(-row(x), x, decreasing=TRUE)], nrow=nrow(x), ncol=ncol(x), byrow=TRUE)

Some timings:

x <- matrix(rnorm(300000*5), nrow=300000, ncol=5)
system.time(t(apply(x, 1, sort, decreasing=TRUE)))
#   user  system elapsed 
#  14.13    0.00   14.13 
system.time(
  matrix(x[order(-row(x),x, decreasing=TRUE)], nrow=nrow(x), ncol=ncol(x), byrow=TRUE)
)
#   user  system elapsed 
#   0.10    0.00    0.09 

Upvotes: 1

MKR
MKR

Reputation: 20095

One option is to use sort with apply, transpose and then cbind with data frame as:

cbind(df, t(apply(df, 1, sort, decreasing = TRUE)))

#   Black White  Red Blue   1    2    3    4
# 1   0.8   0.1 0.07 0.03 0.8 0.10 0.07 0.03
# 2   0.3   0.6 0.00 0.10 0.6 0.30 0.10 0.00
# 3   0.1   0.6 0.25 0.05 0.6 0.25 0.10 0.05

Updated: Based on suggestion from @dww column names can be assigned as:

df[paste0('color',1:4)] = t(apply(df, 1, sort, decreasing = TRUE))

# Black White  Red Blue color1 color2 color3 color4
# 1   0.8   0.1 0.07 0.03    0.8   0.10   0.07   0.03
# 2   0.3   0.6 0.00 0.10    0.6   0.30   0.10   0.00
# 3   0.1   0.6 0.25 0.05    0.6   0.25   0.10   0.05

Upvotes: 2

Related Questions