Reputation: 103
I have a data frame that is sorted based on one column(numeric column) to assign the rank. if this column value is zero then arrange the data frame based on another character column for those rows which have zero as a value in a numeric column.
But to give rank I have to consider var2 that is the reason I sorted based on var2, if there is any identical values in var2 for those rows I have to consider var3 to give rank. please see the data frame 2 and 3 rows, var2 values are identical in that case i have to consider var3 to give rank. In case var2 is zero i have to sort the var1 column(character column) in alphabetical order and give rank. if var2 is NA no rank. please refer the data frame given below.
Below, the data frame is sorted based on var2 column descending order, but var2 contains zero also if var2 is zero I have to sort the data frame based on var1 for the rows which are having zero in var2. I need sort by var1 for those rows which are having var2 as zero and followed by NA in alphabetical order of var1.
example:
# var1 var2 var3 rank
# 1 c 556 45 1
# 2 a 345 35 3
# 3 f 345 64 2
# 4 b 134 87 4
# 5 z 0 34 5
# 6 d 0 32 6
# 7 c 0 12 7
# 8 a 0 23 8
# 9 e NA
# 10 b NA
below is my code
df <- data.frame(var1=c("c","a","f","b","z","d", "c","a", "e", "b", "ad", "gf", "kg", "ts", "mp"), var2=c(134, NA,345, 200, 556,NA, 345, 200, 150, 0, 25,10,0,150,0), var3=c(65,'',45,34,68,'',73,12,35,23,34,56,56,78,123))
# To break the tie between var3 and var2
orderdf <- df[order(df$var2, df$var1, decreasing = TRUE), ]
#assigning rank
rankdf <- orderdf %>% mutate(rank = ifelse(is.na(var2),'', seq(1:nrow(orderdf))))
expected output is sort the var1 in alphabetical order if var2 value is zero(for those rows with var2 value is zero)
expected output:
# var1 var2 var3 rank
# 1 c 556 45 1
# 2 a 345 35 3
# 3 f 345 64 2
# 4 b 134 87 4
# 5 a 0 34 5
# 6 c 0 32 6
# 7 d 0 12 7
# 8 z 0 23 8
# 9 b NA
# 10 e NA
Upvotes: 1
Views: 2201
Reputation: 9247
With dplyr
you can use
df %>%
arrange(desc(var2), var1)
and afterwards you create the column rank
The following code is a bit cumbersome but it gets the job done. Basically it orders the rows in which var2
is equal or different from zero separately, then combines the two ordered dataframes together and finally creates the rank
column.
Data
df <- data.frame(
var1 = c("c","a","f","b","z","d", "c","a", "e", "z", "ad", "gf", "kg", "ts", "mp"),
var2 = c(134, NA,345, 200, 556,NA, 345, 200, 150, 0, 25,10,0,150,0),
var3 = as.numeric(c(65,'',45,34,68,'',73,12,35,23,34,56,56,78,123))
)
df
# var1 var2 var3
# 1 c 134 65
# 2 a NA NA
# 3 f 345 45
# 4 b 200 34
# 5 z 556 68
# 6 d NA NA
# 7 c 345 73
# 8 a 200 12
# 9 e 150 35
# 10 z 0 23
# 11 ad 25 34
# 12 gf 10 56
# 13 kg 0 56
# 14 ts 150 78
# 15 mp 0 123
Code
df %>%
# work on rows with var2 different from 0 or NA
filter(var2 != 0) %>%
arrange(desc(var2), desc(var3)) %>%
# merge with rows with var2 equal to 0 or NA
bind_rows(df %>% filter(var2 == 0 | is.na(var2)) %>% arrange(var1)) %>%
arrange(desc(var2)) %>%
# create the rank column only for the rows with var2 different from NA
mutate(
rank = seq_len(nrow(df)),
rank = ifelse(is.na(var2), NA, rank)
)
Output
# var1 var2 var3 rank
# 1 z 556 68 1
# 2 c 345 73 2
# 3 f 345 45 3
# 4 b 200 34 4
# 5 a 200 12 5
# 6 ts 150 78 6
# 7 e 150 35 7
# 8 c 134 65 8
# 9 ad 25 34 9
# 10 gf 10 56 10
# 11 kg 0 56 11
# 12 mp 0 123 12
# 13 z 0 23 13
# 14 a NA NA NA
# 15 d NA NA NA
Upvotes: 2
Reputation: 388817
You can do it in base R, using order
:
cols <- c('var1', 'var2')
remaining_cols <- setdiff(names(df), cols)
df1 <- df[cols]
cbind(transform(df1[with(df1, order(-var2, var1)), ],
rank = seq_len(nrow(df1))), df[remaining_cols])
# var1 var2 rank var3
#1 c 556 1 45
#2 a 345 2 35
#3 f 345 3 64
#4 b 134 4 87
#8 a 0 5 34
#7 c 0 6 32
#6 d 0 7 12
#5 z 0 8 23
#10 b NA 9 10
#9 e NA 10 11
data
df <- structure(list(var1 = structure(c(3L, 1L, 6L, 2L, 7L, 4L, 3L,
1L, 5L, 2L), .Label = c("a", "b", "c", "d", "e", "f", "z"), class = "factor"),
var2 = c(556L, 345L, 345L, 134L, 0L, 0L, 0L, 0L, NA, NA),
var3 = c(45L, 35L, 64L, 87L, 34L, 32L, 12L, 23L, 10L, 11L
)), class = "data.frame", row.names = c(NA, -10L))
Upvotes: 0
Reputation: 886948
Using data.table
library(data.table)
setDT(df)[order(-var2, var1)][, rank := seq_len(.N)][]
df <- structure(list(var1 = structure(c(3L, 1L, 6L, 2L, 7L, 4L, 3L,
1L, 5L, 2L), .Label = c("a", "b", "c", "d", "e", "f", "z"), class = "factor"),
var2 = c(1456L, 456L, 345L, 134L, 0L, 0L, 0L, 0L, NA, NA)),
class = "data.frame", row.names = c(NA, -10L))
Upvotes: 0
Reputation: 4169
Using only base R's order()
function, sort first on descending order of var2
then ascending order of var1
to sort the data by passing the subsequent integer vector to square braces
df[order(-df$var2, df$var1), ]
Adding a rank column too is then just
df[order(-df$var2, df$var1), "rank"] <- 1:length(df$var1)
Upvotes: 0