Reputation: 111
Hello dear community members. I'm trying to create ranking variables for certain variables in R. For example I want to transform this data frame
> df
X1 X2 X3 X4 X5
1 1 4 7 3 2
2 2 5 8 4 3
3 3 6 3 5 4
4 4 1 2 6 5
5 5 2 1 7 6
into
> df
X1 X2 X3 X4 X5 x1_rank x2_rank x3_rank
1 1 4 7 3 2 3 2 1
2 2 5 8 4 3 3 2 1
3 3 6 3 5 4 3 1 3
4 4 1 2 6 5 1 3 2
5 5 2 1 7 6 1 2 3
like this (select X1~X3, and make ranking variables between them).
I tried this code
for (i in 1:nrow(df)) {
df_rank <- df[i, ] %>%
dplyr::select(X1, X2, X3, X4) %>%
base::rank()
}
I can imagine I can solve this problem by using for loop but I'm beginner about R so I do not understand why this doesn't work.
Upvotes: 3
Views: 252
Reputation: 4419
One way to achieve it is to use the ties argument on negative values.
df <- tibble::tribble(
~x1, ~x2, ~x3, ~x4, ~x5,
1,4,7,3,2,
2,5,8,4,3,
3,6,3,5,4,
4,1,2,6,5,
5,2,1,7,6
)
library(magrittr)
df %>%
cbind(
t(apply(-df[,1:3], 1, rank, ties = "min")) %>% {colnames(.) <- paste0(colnames(.), "_rank"); .}
)
x1 x2 x3 x4 x5 x1_rank x2_rank x3_rank
1 1 4 7 3 2 3 2 1
2 2 5 8 4 3 3 2 1
3 3 6 3 5 4 2 1 2
4 4 1 2 6 5 1 3 2
5 5 2 1 7 6 1 2 3
As to why your code does not work - the for loop does not return anything, instead, it assigns a variable df_rank
every iteration. To fix it, you could declare an object outside of the loop, and add content to it each iteration, and finally bind that to the original data.
m <- matrix(ncol = 3, nrow = 5)
for (i in 1:nrow(df)) {
m[i,] <- -df[i, ] %>%
dplyr::select(x1, x2, x3) %>%
base::rank(ties = "min")
}
colnames(m) <- paste0(names(df)[1:3], "_rank")
df %>% bind_cols(m)
Upvotes: 1