Kellan Baker
Kellan Baker

Reputation: 385

Sorting a data frame by multiple variables

I have a data frame with 4 groups (defined by categories "a" and "b" in column 1 and categories "X" and "Y" in column 2). I want to rank the attributes in column 3 by their values in column 4, but specifically within the groups in columns 1 and 2 (AX, AY, BX, BY).

How can I go from this:

col1    col2    col3    col4
a       X       pat     1
b       Y       dog     2
b       X       leg     3
a       X       hog     4                   
b       Y       egg     5
a       Y       log     6
b       X       map     7
b       Y       ice     8
b       X       mat     9
a       Y       sat     10

to this?

col1    col2    col3    col4
a       X       hog     4
a       X       pat     1
a       Y       sat     10
a       Y       log     6                   
b       X       mat     9
b       X       map     7
b       X       leg     3
b       Y       ice     8
b       Y       egg     5
b       Y       dog     2

(example input code below)

col1 <- c('a','b','b','a','b','a','b','b','b','a')
col2 <- c('X','Y','X','X','Y','Y','X','Y','X','Y')
col3 <- c('pat','dog','leg','hog','egg','log','map','ice','mat','sat')
col4 <- c(1,2,3,4,5,6,7,8,9,10)

df <- data.frame(col1,col2,col3,col4)

colA <- c('a','a','a','a','b','b','b','b','b','b')
colB <- c('X','X','Y','Y','X','X','X','Y','Y','Y')
colC <- c('hog','pat','sat','log','mat','map','leg','ice','egg','dog')
colD <- c(4,1,10,6,9,7,3,8,5,2)

df1 <- data.frame(colA,colB,colC,colD)

I tried the following, but it gives a random arrangement that has none of the ranked-within-groups structure that I want:

df %>% group_by(col1, col2) %>% arrange(desc(col4)) 

df %>% group_by(col1) %>% arrange(col1) %>% group_by(col2) %>% arrange(col2) sorts the data frame correctly by the first two columns, but I can't further arrange it by col4.

Upvotes: 1

Views: 190

Answers (2)

linog
linog

Reputation: 6226

@akrun is right, no need for group_by. The equivalent implementation in data.table would be:

library(data.table)
setDT(df)
df[order(col1, col2, -col4)]
    col1 col2 col3 col4
 1:    a    X  hog    4
 2:    a    X  pat    1
 3:    a    Y  sat   10
 4:    a    Y  log    6
 5:    b    X  mat    9
 6:    b    X  map    7
 7:    b    X  leg    3
 8:    b    Y  ice    8
 9:    b    Y  egg    5
10:    b    Y  dog    2

Upvotes: 2

akrun
akrun

Reputation: 887951

For this, we don't need group_by

library(dplyr)
df %>%
    arrange(col1, col2, desc(col4))
#   col1 col2 col3 col4
#1     a    X  hog    4
#2     a    X  pat    1
#3     a    Y  sat   10
#4     a    Y  log    6
#5     b    X  mat    9
#6     b    X  map    7
#7     b    X  leg    3
#8     b    Y  ice    8
#9     b    Y  egg    5
#10    b    Y  dog    2

Upvotes: 2

Related Questions