cryptickey
cryptickey

Reputation: 308

Grouping using a column `cut` using `ordered_result = TRUE` does not list in ordered fashion

When we group a data.table by using breaks cut using ordered_result = TRUE does not list the cut breaks labels in increasing order (rather it seems to be in the order in which the breaks labels are found in the data.table, which is the same behaviour as with ordered_result = FALSE. Why does data.table not care about ordered factors ?

> aaa <- c(1,2,3,4,5,2,3,4,5,6,7)
> aaa <- rev(aaa)
> d <- data.table(x = 1:length(aaa), val = aaa)
> # The following statement will not order the group by result using the ordered labels in increasing fashion.
> d[, sum(x), by = cut(aaa, 3, ordered_result = TRUE)]
         cut V1
1:  (5,7.01]  3
2:     (3,5] 22
3: (0.994,3] 41
> # Infact, the behavior is same as with ordered_result = FALSE
> d[, sum(x), by = cut(aaa, 3, ordered_result = FALSE)]
         cut V1
1:  (5,7.01]  3
2:     (3,5] 22
3: (0.994,3] 41

Upvotes: 0

Views: 241

Answers (1)

Miff
Miff

Reputation: 7941

The difference between that ordering factors makes is largely limited to how the factors are treated in statistical models (it's alluded to in ?factor but there's not a lot of detail).

The data.table extraction does not guarantee being sorted according to its by argument (whether or not it is an ordered factor). To achieve that, use the keyby argument:

d[, sum(x), keyby = cut(aaa, 3)]
#         cut V1
#1: (0.994,3] 41
#2:     (3,5] 22
#3:  (5,7.01]  3 

In your example, the factor ordering works correctly, in that the cut column remains an ordered factor, compare the following:

str(d[, sum(x), by = cut(aaa, 3, ordered_result = TRUE)])
#Classes ‘data.table’ and 'data.frame':  3 obs. of  2 variables:
# $ cut: Ord.factor w/ 3 levels "(0.994,3]"<"(3,5]"<..: 3 2 1
# $ V1 : int  3 22 41
# - attr(*, ".internal.selfref")=<externalptr> 

str(d[, sum(x), by = cut(aaa, 3, ordered_result = FALSE)])
#Classes ‘data.table’ and 'data.frame':  3 obs. of  2 variables:
# $ cut: Factor w/ 3 levels "(0.994,3]","(3,5]",..: 3 2 1
# $ V1 : int  3 22 41
# - attr(*, ".internal.selfref")=<externalptr> 

Note the change in the class of cut from Ord.factor to Factor.

Upvotes: 3

Related Questions