Reputation: 149
My data frame is set as follows:
Black White Red Blue
0.8 0.1 0.07 0.03
0.3 0.6 0 0.1
0.1 0.6 0.25 0.05
I wanted my data frame to look like this:
Black White Red Blue Color1 Color2 Color3 Color4
0.8 0.1 0.07 0.03 0.8 0.1 0.07 0.03
0.3 0.6 0 0.1 0.6 0.3 0.1 0
0.1 0.6 0.25 0.05 0.6 0.25 0.1 0.05
In which Color1 represents the largest value for each row, Color2 represents the second largest value, Color3 represents the third largest, and Color4 represents the smallest value for each row.
So far, I've used this function to obtain what I wanted, which is the result above:
maxn <- function(n) function(x) order(x, decreasing = TRUE)[n]
df$Color1 <- apply(df, 1, max)
df$Color2 <- apply(df, 1, function(x)x[maxn(3)(x)])
df$Color3 <- apply(df, 1, function(x)x[maxn(4)(x)])
df$Color4 <- apply(df, 1, function(x)x[maxn(5)(x)])
Is there a more concise way for me to arrange my dataset?
Additionally, a bit off-topic: I'm not sure if it's because this is a CSV file that I'm working with that whenever I use the function
df$Color2 <- apply(df, 1, function(x)x[maxn(2)(x)])
It will return the same result as the function
apply(df, 1, max)
AND
apply(df, 1, function(x)x[maxn(1)(x)])
Upvotes: 0
Views: 257
Reputation: 93938
It's quite a bit more complex but a speedier solution if you're dealing with a large number of rows is to only do the sorting/ordering once and re-insert it into a matrix shape:
matrix(x[order(-row(x), x, decreasing=TRUE)], nrow=nrow(x), ncol=ncol(x), byrow=TRUE)
Some timings:
x <- matrix(rnorm(300000*5), nrow=300000, ncol=5)
system.time(t(apply(x, 1, sort, decreasing=TRUE)))
# user system elapsed
# 14.13 0.00 14.13
system.time(
matrix(x[order(-row(x),x, decreasing=TRUE)], nrow=nrow(x), ncol=ncol(x), byrow=TRUE)
)
# user system elapsed
# 0.10 0.00 0.09
Upvotes: 1
Reputation: 20095
One option is to use sort
with apply
, transpose
and then cbind
with data frame as:
cbind(df, t(apply(df, 1, sort, decreasing = TRUE)))
# Black White Red Blue 1 2 3 4
# 1 0.8 0.1 0.07 0.03 0.8 0.10 0.07 0.03
# 2 0.3 0.6 0.00 0.10 0.6 0.30 0.10 0.00
# 3 0.1 0.6 0.25 0.05 0.6 0.25 0.10 0.05
Updated: Based on suggestion from @dww column names can be assigned as:
df[paste0('color',1:4)] = t(apply(df, 1, sort, decreasing = TRUE))
# Black White Red Blue color1 color2 color3 color4
# 1 0.8 0.1 0.07 0.03 0.8 0.10 0.07 0.03
# 2 0.3 0.6 0.00 0.10 0.6 0.30 0.10 0.00
# 3 0.1 0.6 0.25 0.05 0.6 0.25 0.10 0.05
Upvotes: 2