Hmm
Hmm

Reputation: 103

Sort a data frame based on another sorted column value in R

I have a data frame that is sorted based on one column(numeric column) to assign the rank. if this column value is zero then arrange the data frame based on another character column for those rows which have zero as a value in a numeric column.

But to give rank I have to consider var2 that is the reason I sorted based on var2, if there is any identical values in var2 for those rows I have to consider var3 to give rank. please see the data frame 2 and 3 rows, var2 values are identical in that case i have to consider var3 to give rank. In case var2 is zero i have to sort the var1 column(character column) in alphabetical order and give rank. if var2 is NA no rank. please refer the data frame given below.

Below, the data frame is sorted based on var2 column descending order, but var2 contains zero also if var2 is zero I have to sort the data frame based on var1 for the rows which are having zero in var2. I need sort by var1 for those rows which are having var2 as zero and followed by NA in alphabetical order of var1.

    example:
    #      var1    var2    var3    rank
    # 1     c      556      45       1
    # 2     a      345      35       3
    # 3     f      345      64       2
    # 4     b      134      87       4
    # 5     z       0       34       5
    # 6     d       0       32       6
    # 7     c       0       12       7
    # 8     a       0       23       8
    # 9     e      NA      
    # 10    b      NA       

below is my code 
df <- data.frame(var1=c("c","a","f","b","z","d", "c","a", "e", "b", "ad", "gf", "kg", "ts", "mp"), var2=c(134, NA,345, 200, 556,NA, 345, 200, 150, 0, 25,10,0,150,0), var3=c(65,'',45,34,68,'',73,12,35,23,34,56,56,78,123))

# To break the tie between var3 and var2 
orderdf <- df[order(df$var2, df$var1, decreasing = TRUE), ] 

#assigning rank 
rankdf <- orderdf %>% mutate(rank = ifelse(is.na(var2),'', seq(1:nrow(orderdf))))

expected output is sort the var1 in alphabetical order if var2 value is zero(for those rows with var2 value is zero)

    expected output:
    #      var1    var2    var3    rank
    # 1     c      556      45       1
    # 2     a      345      35       3
    # 3     f      345      64       2
    # 4     b      134      87       4
    # 5     a       0       34       5
    # 6     c       0       32       6
    # 7     d       0       12       7
    # 8     z       0       23       8
    # 9     b      NA      
    # 10    e      NA       

Upvotes: 1

Views: 2201

Answers (4)

Ric S
Ric S

Reputation: 9247

With dplyr you can use

df %>% 
  arrange(desc(var2), var1)

and afterwards you create the column rank


EDIT

The following code is a bit cumbersome but it gets the job done. Basically it orders the rows in which var2 is equal or different from zero separately, then combines the two ordered dataframes together and finally creates the rank column.

Data

df <- data.frame(
  var1 = c("c","a","f","b","z","d", "c","a", "e", "z", "ad", "gf", "kg", "ts", "mp"), 
  var2 = c(134, NA,345, 200, 556,NA, 345, 200, 150, 0, 25,10,0,150,0), 
  var3 = as.numeric(c(65,'',45,34,68,'',73,12,35,23,34,56,56,78,123))
)
df
#    var1 var2 var3
# 1     c  134   65
# 2     a   NA   NA
# 3     f  345   45
# 4     b  200   34
# 5     z  556   68
# 6     d   NA   NA
# 7     c  345   73
# 8     a  200   12
# 9     e  150   35
# 10    z    0   23
# 11   ad   25   34
# 12   gf   10   56
# 13   kg    0   56
# 14   ts  150   78
# 15   mp    0  123

Code

df %>% 
# work on rows with var2 different from 0 or NA
  filter(var2 != 0) %>% 
  arrange(desc(var2), desc(var3)) %>% 
# merge with rows with var2 equal to 0 or NA
  bind_rows(df %>% filter(var2 == 0 | is.na(var2)) %>% arrange(var1)) %>% 
  arrange(desc(var2)) %>% 
# create the rank column only for the rows with var2 different from NA
  mutate(
    rank = seq_len(nrow(df)),
    rank = ifelse(is.na(var2), NA, rank)
    )

Output

#    var1 var2 var3 rank
# 1     z  556   68    1
# 2     c  345   73    2
# 3     f  345   45    3
# 4     b  200   34    4
# 5     a  200   12    5
# 6    ts  150   78    6
# 7     e  150   35    7
# 8     c  134   65    8
# 9    ad   25   34    9
# 10   gf   10   56   10
# 11   kg    0   56   11
# 12   mp    0  123   12
# 13    z    0   23   13
# 14    a   NA   NA   NA
# 15    d   NA   NA   NA

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388817

You can do it in base R, using order :

cols <- c('var1', 'var2')
remaining_cols <- setdiff(names(df), cols)
df1 <- df[cols]
cbind(transform(df1[with(df1, order(-var2, var1)), ], 
                rank = seq_len(nrow(df1))), df[remaining_cols])


#   var1 var2 rank var3
#1     c  556    1   45
#2     a  345    2   35
#3     f  345    3   64
#4     b  134    4   87
#8     a    0    5   34
#7     c    0    6   32
#6     d    0    7   12
#5     z    0    8   23
#10    b   NA    9   10
#9     e   NA   10   11

data

df <- structure(list(var1 = structure(c(3L, 1L, 6L, 2L, 7L, 4L, 3L, 
1L, 5L, 2L), .Label = c("a", "b", "c", "d", "e", "f", "z"), class = "factor"), 
var2 = c(556L, 345L, 345L, 134L, 0L, 0L, 0L, 0L, NA, NA), 
var3 = c(45L, 35L, 64L, 87L, 34L, 32L, 12L, 23L, 10L, 11L
)), class = "data.frame", row.names = c(NA, -10L))

Upvotes: 0

akrun
akrun

Reputation: 886948

Using data.table

library(data.table)
setDT(df)[order(-var2, var1)][, rank := seq_len(.N)][]

data

df <- structure(list(var1 = structure(c(3L, 1L, 6L, 2L, 7L, 4L, 3L, 
1L, 5L, 2L), .Label = c("a", "b", "c", "d", "e", "f", "z"), class = "factor"), 
var2 = c(1456L, 456L, 345L, 134L, 0L, 0L, 0L, 0L, NA, NA)), 
class = "data.frame", row.names = c(NA, -10L))

Upvotes: 0

rg255
rg255

Reputation: 4169

Using only base R's order() function, sort first on descending order of var2 then ascending order of var1 to sort the data by passing the subsequent integer vector to square braces

df[order(-df$var2, df$var1), ]

Adding a rank column too is then just

df[order(-df$var2, df$var1), "rank"] <- 1:length(df$var1)

Upvotes: 0

Related Questions