TobiSonne
TobiSonne

Reputation: 1107

Sort data.table by grouping variable with condition

I have the following data.table (in fact my data.table is much bigger (more groups and more other variables)):

Data <- data.table(Group = rep(c("a", "b"), each = 3),
                   Var = 1:6)

> print(Data)
   Group Var
1:     a   1
2:     a   2
3:     a   3
4:     b   4
5:     b   5
6:     b   6

Now I want to sort the data.table based on the variable Group but only if Group == "a". My poor attempt was the following:

> Data[Group == "a", .SD[.N:1]]
   Group Var
1:     a   3
2:     a   2
3:     a   1

I know why this is wrong but I can not think of a solution that leads to my desired output:

   Group Var
1:     a   3
2:     a   2
3:     a   1
4:     b   4
5:     b   5
6:     b   6

Upvotes: 0

Views: 323

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388817

Your attempt would work with

library(data.table)
Data[Group == "a"] <- Data[Group == "a", .SD[.N:1]]
Data
#    Group Var
#1:     a   3
#2:     a   2
#3:     a   1
#4:     b   4
#5:     b   5
#6:     b   6

But the above just reverses the rows if you want to sort the rows in decreasing order based on Var you could do

Data[Group == "a"] <- Data[Group == "a", .SD[order(-Var)]]

If you have multiple columns you can do

cols <- c("a", "b")
Data[Group %in% cols] <- Data[Group %in% cols,.SD[order(-Var)],Group]

A better approach with data.table is to update by reference as suggested by @markus

Data[Group %in% cols, Var := .SD[order(-Var)]$Var]

Upvotes: 2

Vidhya G
Vidhya G

Reputation: 2320

Without using the .SD notation:

> Data[Group == "a", Var := sort(Var, decreasing = TRUE)]
> Data

   Group Var
1:     a   3
2:     a   2
3:     a   1
4:     b   4
5:     b   5
6:     b   6

As you have a large data.table and more groups, you may wish to consider the use of the .I notation (see the article by @nathaneastwood) as it does result in better performance in some situations. Here, .I will identify the row numbers of interest. Let us change the example so that we are interested in two groups:

Data <- data.table(Group = rep(c("a", "b", "c"), each = 3), Var = 10:18)

Then:

> Data[Data[, .I[Group  %in% c("a", "c")]], Var := sort(Var, decreasing = TRUE), by = Group]

> Data
   Group Var
1:     a  12
2:     a  11
3:     a  10
4:     b  13
5:     b  14
6:     b  15
7:     c  18
8:     c  17
9:     c  16

For completeness the basic idea is contained in:

Data[Group %in% c("a", "c"), Var:= sort(Var, decreasing = TRUE), by = Group]

Upvotes: 2

Related Questions